GoogleCloudPlatform / cloudml-samples

Cloud ML Engine repo. Please visit the new Vertex AI samples repo at https://github.com/GoogleCloudPlatform/vertex-ai-samples
https://cloud.google.com/ai-platform/docs/
Apache License 2.0
1.51k stars 860 forks source link

Adding parameter to execution fires an error #479

Closed OrielResearchCure closed 4 years ago

OrielResearchCure commented 4 years ago

Describe the bug A clear and concise description of what the bug is. Be sure to convey here whether it occurred locally or on the server (AI Platform, Google Dataflow)

What sample is this bug related to?

Source code / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.

To Reproduce Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior A clear and concise description of what you expected to happen.

System Information

To obtain the Tensorflow and Tensorflow Transform environment do

pip freeze |grep tensorflow
pip freeze |grep apache-beam

Additional context Add any other context about the problem here.

Hello,

Not sure that this is the right place to post this question. I am working with ai-platform to run training on the cloud ML machines the command is the following command in a shell script: LATEST=192_20200324030843 DIM=192 TARGET_DIM=192 EPOCHS=25 BATCH_SIZE=16 local_or_remote=remote JOBNAME=AML$DATE LOG_DIR=gs://path/to/log_dir/ MODEL_CHOICE=5

gcloud ai-platform jobs submit training $JOB_NAME \
        --runtime-version=2.1 \
        --job-dir=$JOB_DIR \
        --package-path=trainer \
        --module-name=trainer.task \
        --region=us-east1 \
        --python-version=3.5 \
        --scale-tier=basic-gpu
        -- \
        --LATEST $LATEST \
        --MODEL_CHOICE $MODEL_CHOICE
        --DIM $DIM \
        --TARGET_DIM $TARGET_DIM \
        --EPOCHS $EPOCHS \
        --BATCH_SIZE $BATCH_SIZE \
        --local_or_remote $local_or_remote \
        --LOG_DIR $LOG_DIR

I tried many ways to write this command. the ai-platform run the training, however, I am unable to pass any parameters. The following error is being fired:

-- \
        --LATEST $LATEST \
        --MODEL_CHOICE $MODEL_CHOICE
./scripts/train-remote.sh: line 49: --: command not found

Is there any editing in the command that I should be aware of? What is missing in this command? Many thanks, eilalan

gogasca commented 4 years ago

There is a missing \ after --MODEL_CHOICE $MODEL_CHOICE

gogasca commented 4 years ago

@OrielResearchCure can you provide an status? Thanks

OrielResearchCure commented 4 years ago

sorry for the delay. I have just run it. look good. Thanks!