abetancurg / BookNotes

This repo is made for prooving stuff
0 stars 0 forks source link

All the things one needs to keep in mind to run a pipeline successfully by Dataflow #1

Open abetancurg opened 2 years ago

abetancurg commented 2 years ago

Pending to write...

abetancurg commented 2 years ago

Login

The first thing I recommend is to check if I have the Cloud SDK already installed. If not, by this url I can download it.

On this shell, is necessarily run the following code to authenticate: gcloud auth login and gcloud auth application-default login

When I am already done with the auth, is quite well check the project out by running the code gcloud config list

Running the URL

What I need to do left is to run the big URL. It's big in this case bcuz the target here is to run it throught Dataflow and not throught DirectRunner.

For what I mentioned above, is too important to keep in mind the tips below:

One example of the big URL is:

python27 ejercicio2.py --entrada gs://ct-pio/data/el_quijote.txt --salida gs://ct-pio/out/salida.csv --runner DataflowRunner --project contento-bi --region us-central1 --temp_location gs://ct-pio/tmp/ --n-palabras 3 --subnetwork https://www.googleapis.com/compute/v1/projects/contento-bi/regions/us-central1/subnetworks/contento-subnet1

Note: After run this URL and the Shell shows a 401 or 403 Error, please read the Login part of this issue.