GoogleCloudPlatform / realtime-embeddings-matching

Apache License 2.0
126 stars 30 forks source link

Error in running the command bash run.sh #3

Open shahmansi78 opened 4 years ago

shahmansi78 commented 4 years ago

Hi, I am getting the following error while running the command bash run.sh

Cleaning working and output directories... CommandException: "rm" command does not support "file://" URLs. Did you mean to use a gs:// URL? CommandException: "rm" command does not support "file://" URLs. Did you mean to use a gs:// URL? Running the Dataflow job...


Python 2 is deprecated. Upgrade to Python 3 as soon as possible. See https://cloud.google.com/python/docs/python2-sunset To suppress this warning, create an empty ~/.cloudshell/no-python-warning file. The command will automatically proceed in seconds or on any key.


Traceback (most recent call last): File "run.py", line 69, in main() File "run.py", line 64, in main pipeline.run(pipeline_options, known_args) File "/home/shahmansi78/realtime-embeddings-matching/text-semantic-search/embeddings_extraction/etl/pipeline.py", line 103, in run pipeline = beam.Pipeline(options=pipeline_options) File "/home/shahmansi78/.local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 149, in init 'Pipeline has validations errors: \n' + '\n'.join(errors)) ValueError: Pipeline has validations errors: Invalid GCS path (wordembedding/wikipedia/dataflow/temp), given for the option: temp_location. Invalid GCS path (wordembedding/wikipedia/dataflow/staging), given for the option: staging_location. Dataflow job submitted successfully!

kalradivyanshu commented 4 years ago

Had the same issue, solved it by changing line 38 of run.sh from:

DF_JOB_DIR="${BUCKET}/${KIND}/dataflow"

to

DF_JOB_DIR="gs://${BUCKET}/${KIND}/dataflow"

Hope this helps.