VIDA-NYU / alpha-automl

Alpha-AutoML is a Python library for automatically generating end-to-end machine learning pipelines.
https://alpha-automl.readthedocs.io
Apache License 2.0
19 stars 3 forks source link

Execution stops after creating Board when using MyEmbedder. #25

Closed laibamehnaz closed 1 year ago

laibamehnaz commented 1 year ago

On running python adding_new_primitives_huggingface.py, the execution stops here.

DEBUG:urllib3.connectionpool:https://huggingface.co:443 "HEAD /bert-base-uncased/resolve/main/vocab.txt HTTP/1.1" 200 0
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): huggingface.co:443
DEBUG:urllib3.connectionpool:https://huggingface.co:443 "HEAD /bert-base-uncased/resolve/main/config.json HTTP/1.1" 200 0
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
/ext3/miniconda3/lib/python3.10/site-packages/sklearn/preprocessing/_label.py:116: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  y = column_or_1d(y, warn=True)
INFO:datamart_profiler.core:Setting column names from header
INFO:datamart_profiler.core:Identifying types, 4 columns...
INFO:datamart_profiler.core:Processing column 0 'text'...
INFO:datamart_profiler.core:Column type http://schema.org/Text [http://schema.org/Text]
INFO:datamart_profiler.core:Processing column 1 'Time of Tweet'...
INFO:datamart_profiler.core:Column type http://schema.org/Text [http://schema.org/Enumeration]
INFO:datamart_profiler.core:Processing column 2 'Age of User'...
INFO:datamart_profiler.core:Column type http://schema.org/Text [http://schema.org/Enumeration]
INFO:datamart_profiler.core:Processing column 3 'Country'...
INFO:datamart_profiler.core:Column type http://schema.org/Text [http://schema.org/Enumeration]
INFO:alpha_automl.data_profiler:Results of profiling data: non-numeric features = dict_keys(['TEXT_ENCODER', 'CATEGORICAL_ENCODER']), useless columns = [], missing values = True
INFO:alpha_automl.utils:Sampling down data from 27481 to 2000
INFO:alpha_automl.pipeline_synthesis.setup_search:Creating a manual grammar
INFO:alpha_automl.primitive_loader:Hierarchy of all primitives loaded
INFO:alpha_automl.grammar_loader:Creating task grammar for task CLASSIFICATION_TASK
INFO:alpha_automl.grammar_loader:Task grammar: Grammar with 31 productions (start state = S)
    S -> IMPUTATION ENCODERS FEATURE_SCALING FEATURE_SELECTION CLASSIFICATION
    ENCODERS -> TEXT_ENCODER CATEGORICAL_ENCODER
    IMPUTATION -> 'sklearn.impute.SimpleImputer'
    FEATURE_SCALING -> 'sklearn.preprocessing.MaxAbsScaler'
    FEATURE_SCALING -> 'sklearn.preprocessing.RobustScaler'
    FEATURE_SCALING -> 'sklearn.preprocessing.StandardScaler'
    FEATURE_SCALING -> 'E'
    FEATURE_SELECTION -> 'sklearn.feature_selection.GenericUnivariateSelect'
    FEATURE_SELECTION -> 'sklearn.feature_selection.SelectPercentile'
    FEATURE_SELECTION -> 'sklearn.feature_selection.SelectKBest'
    FEATURE_SELECTION -> 'E'
    TEXT_ENCODER -> 'sklearn.feature_extraction.text.CountVectorizer'
    TEXT_ENCODER -> 'sklearn.feature_extraction.text.TfidfVectorizer'
    TEXT_ENCODER -> 'my_module.MyEmbedder'
    CATEGORICAL_ENCODER -> 'sklearn.preprocessing.OneHotEncoder'
    CLASSIFICATION -> 'sklearn.discriminant_analysis.LinearDiscriminantAnalysis'
    CLASSIFICATION -> 'sklearn.discriminant_analysis.QuadraticDiscriminantAnalysis'
    CLASSIFICATION -> 'sklearn.ensemble.BaggingClassifier'
    CLASSIFICATION -> 'sklearn.ensemble.ExtraTreesClassifier'
    CLASSIFICATION -> 'sklearn.ensemble.GradientBoostingClassifier'
    CLASSIFICATION -> 'sklearn.ensemble.RandomForestClassifier'
    CLASSIFICATION -> 'sklearn.naive_bayes.BernoulliNB'
    CLASSIFICATION -> 'sklearn.naive_bayes.GaussianNB'
    CLASSIFICATION -> 'sklearn.naive_bayes.MultinomialNB'
    CLASSIFICATION -> 'sklearn.neighbors.KNeighborsClassifier'
    CLASSIFICATION -> 'sklearn.linear_model.LogisticRegression'
    CLASSIFICATION -> 'sklearn.linear_model.PassiveAggressiveClassifier'
    CLASSIFICATION -> 'sklearn.linear_model.SGDClassifier'
    CLASSIFICATION -> 'sklearn.svm.LinearSVC'
    CLASSIFICATION -> 'sklearn.svm.SVC'
    CLASSIFICATION -> 'sklearn.tree.DecisionTreeClassifier'
INFO:alpha_automl.grammar_loader:Creating game grammar
INFO:alpha_automl.pipeline_search.Coach:------ITER 1------
INFO:alpha_automl.pipeline_search.MCTS:MCTS SIMULATION 1
INFO:alpha_automl.pipeline_search.pipeline.PipelineGame:PIPELINE: S
/scratch/lm4428/d3m_latest/alpha-automl/alpha_automl/pipeline_search/pipeline/NNet.py:103: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  board = Variable(board, volatile=True)
roquelopez commented 1 year ago

I think it's because you changed the start mode of the multiprocessing lib, from spawn to fork in this line: https://github.com/VIDA-NYU/alpha-automl/blob/laiba-dev/alpha_automl/automl_manager.py#L74

I prefer to use spawn because it's safer and it runs in Windows.

roquelopez commented 1 year ago

Also, you should run your script inside the __main__ function. Actually, it is a limitation of the multiprocessing lib. https://stackoverflow.com/questions/50781216/in-python-multiprocessing-process-do-we-have-to-use-name-main

laibamehnaz commented 1 year ago

spawn didn't work for me. It gave me the error saying fork should be used. So I had changed that. However, I put this here to discuss the error in the meeting. Sorry I didn't mention it in the issue.

roquelopez commented 1 year ago

I see. I think it raised the error because you didn't use the if __name__ == '__main__': like the example: https://github.com/VIDA-NYU/alpha-automl/blob/devel/examples/tabular_classification.py#L5

But yes, we should clarify it in the documentation. Thanks!

laibamehnaz commented 1 year ago

Ah got it. Thank you so much. :)