Closed tonyhqanguyen closed 5 years ago
I think I got that part figured out, I thought the data was successfully loaded into the bucket when I ran
python reddit/create_data.py \ --output_dir ${DATADIR?} \ --reddit_table ${PROJECT?}:${DATASET?}.${TABLE?} \ --runner DataflowRunner \ --temp_location ${DATADIR?}/temp \ --staging_location ${DATADIR?}/staging \ --project ${PROJECT?}
but apparently nothing was happening when I did that.
However, now that it's running, it's predicting 18 hours runtime required. Is this normal?
Can you check how many workers the dataflow job is using on the dataflow console? You may need to increase your quota for it to parallelise over more machines.
From the readme:
Typical metrics for the Dataflow job:
Total vCPU time: 625.507 vCPU hr Total memory time: 322.5 GB hr Total persistent disk time: 156,376.805 GB hr Elapsed time: 1h 38m 409 workers) Estimated cost: 44 USD
@matthen Yeah the quota I have is just limiting the number of workers. Thanks!
cool, glad it's working! I added a note in #43
Hi, I was just wondering what the fix is for this issue. For the reddit dataset, I have followed all the steps up to before executing:
python tools/tfrutil.py pp ${DATADIR?}/train-00999-of-01000.tfrecords
But when I do, I get this error:
I suppose this is due to it not being able to access my credentials, so I followed the instructions here:
https://cloud.google.com/compute/docs/access/create-enable-service-accounts-for-instances
and downloaded a
<project>-<code>.json
file with { "type": "service_account", "project_id": "xxxx", "private_key_id": "xxxxxxxxx", "private_key": "-----BEGIN PRIVATE KEY-----\n xxxxxxx \n-----END PRIVATE KEY-----\n", "client_email": "xxxxx@developer.gserviceaccount.com", "client_id": "xxxxxxx", "auth_uri": "https://accounts.google.com/o/oauth2/auth", "token_uri": "https://oauth2.googleapis.com/token", "auth_provider_x509_cert_url": "xxxxxxxxxxxxxx", "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/xxxxxxx" }The error still persists. I would really appreciate any advice.