GoogleCloudPlatform / cloudml-samples

Cloud ML Engine repo. Please visit the new Vertex AI samples repo at https://github.com/GoogleCloudPlatform/vertex-ai-samples
https://cloud.google.com/ai-platform/docs/
Apache License 2.0
1.51k stars 860 forks source link

ai-platform training : FATAL Flags parsing error: Unknown command line flag 'job_dir'. #496

Closed kimalaacer closed 2 years ago

kimalaacer commented 3 years ago

I am trying to train an efficient d6 with a fixed image size at 640 with pretrained weight by submitting a training job to GCE using V100, using a tf image from gcr.io. The docker image is created and assigned a tag, but the training job fails with an error : master-replica-0: "FATAL Flags parsing error: Unknown command line flag 'job_dir'. Did you mean: log_dir ?" it seems i need to specify an additional required cl argument : job_dir .

I tried adding: --job_dir=gs://training/job_20210130_184616 to the below command but still the job failed.

my command from jupyter in dlvm: !submit_ training_job -c 123456abc -i docker_image -y caip_config.yaml --mode=train --model_name=efficientdet-d6 --model_dir=run2/ --ckpt=efficientdet-d6-640/ --training_file_pattern=gs://train/.tfrecord --validation_file_pattern=gs://val/.tfrecord --train_batch_size=100 --eval_batch_size=1 --eval_samples=32 --num_examples_per_epoch=315 --num_epochs=300 --hparams=config.yaml

kweinmeister commented 2 years ago

Hello, is this issue related to a code sample in this repo? If so, can you please point us to the notebook you are experiencing an issue with?

kweinmeister commented 2 years ago

As we haven't heard back from this issue, we'll go ahead and close it. Please feel free to reopen if this is still something that you'd like to investigate. Thank you.