fhswf / tagflip-autonlp

Automate NLP tasks
https://autonlp.informatik.fh-swf.de
MIT License
3 stars 2 forks source link

Sequence classification fails #6

Closed cgawron closed 2 years ago

cgawron commented 2 years ago

Running a sequence classification training fails with ImportError: cannot import name 'HuggingFaceSequenceClassificationSavable' from 'tagflip.model.mlflow.huggingface:

2022/03/01 14:26:37 INFO mlflow.projects.backend.local: === Running command 'docker run --rm --gpus "device=0" --add-host host.docker.internal:host-gateway -e MLFLOW_RUN_ID=a63067e554d94ac98e9d4e65b136399a -e MLFLOW_TRACKING_URI=https://jupiter.fh-swf.de/auto-nlp-mlflow -e MLFLOW_EXPERIMENT_ID=8 -e AWS_SECRET_ACCESS_KEY=minioadmin -e AWS_ACCESS_KEY_ID=minioadmin -e MLFLOW_S3_ENDPOINT_URL=http://minio.gawron.cloud -e MLFLOW_TRACKING_USERNAME=mlflow -e MLFLOW_TRACKING_PASSWORD=mlflow train-huggingface:cc7c47a python3 hf_generic_sequence_classification.py \
  \
  --project_id 621e1e0b02b120e92782b6f0 \
  --training_id 621e1e4102b120e92782b77f \
  \
  --model_name bert-base-cased \
  --model_revision main \
  --config_name bert-base-cased \
  --tokenizer_name bert-base-cased \
  \
  --tagflip_host https://jupiter.fh-swf.de/auto-nlp-core \
  --dataset_name imdb \
  --subset_name plain_text \
  --dataset_provider_name huggingface \
  \
  --evaluation_strategy epoch \
  --logging_steps 500 \
  --num_train_epochs 2.0 \
  --load_best_model_at_end true \
  --metric_for_best_model f1 \
  --greater_is_better true \
  \
  --output_dir ./output \
  --logging_first_step true \
  \
  --search_hyperparams false \
  --trials 1' in run with ID 'a63067e554d94ac98e9d4e65b136399a' === 
Traceback (most recent call last):
  File "hf_generic_sequence_classification.py", line 27, in <module>
    from tagflip.model.mlflow.huggingface import HuggingFaceSequenceClassificationSavable, HuggingFaceWorkflow
ImportError: cannot import name 'HuggingFaceSequenceClassificationSavable' from 'tagflip.model.mlflow.huggingface' (/workspace/src/tagflip/packages/auto-nlp-workflow-lib/src/tagflip/model/mlflow/huggingface/__init__.py)
2022/03/01 14:26:46 ERROR mlflow.cli: === Run (ID 'a63067e554d94ac98e9d4e65b136399a') failed ===
[2022-03-01 14:26:50,679] [PID 50] [Thread-10] [Run a63067e554d94ac98e9d4e65b136399a] [INFO] Docker run failed or killed.
[2022-03-01 14:26:50,680] [PID 50] [Thread-10] [Run a63067e554d94ac98e9d4e65b136399a] [INFO] Exiting...
[2022-03-01 14:26:50,685] [PID 50] [Thread-10] [trainings.runtimes.docker.docker_training_actor.DockerTrainingActor] [INFO] Actor done.
cgawron commented 2 years ago

The issue is caused by an outdated version of the huggingface-pytorch-gpu docker image. The build script seems not to (automatically) update this image.

cgawron commented 2 years ago

Fixed by 080b3da1769b9a65ea6a3ea6e471f243776e2035