broadinstitute / gnomad_methods

Hail helper functions for the gnomAD project and Translational Genomics Group
https://gnomad.broadinstitute.org
MIT License
89 stars 29 forks source link

Unable to initialize google cluster according to guidelines #728

Closed wjwei-handsome closed 2 months ago

wjwei-handsome commented 3 months ago

Hi!

When performing the first step according to the tutorial, an error occurred that the cluster cannot be started. The specific command are as follows:

hailctl dataproc start new-cluster-name --packages gnomad

The following error occurred:

Initialization action failed. Failed action 'gs://hail-common/hailctl/dataproc/0.2.132/init_notebook.py', see output in: gs://dataproc-staging-us-central1-435769693476-xj5bpaxe/google-cloud-dataproc-metainfo/209025fe-915f-4228-bb9f-871ee6e9a657/new-cluster-name-m/dataproc-initialization-script-0_output.

Error output in gs looks like:

ERROR: Could not install packages due to an OSError: HTTPSConnectionPool(host='github.com', port=443): Max retries exceeded with url: /hail-is/jgscm/archive/v0.1.13+hail.zip (Caused by ConnectTimeoutError(<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f7d030a20d0>, 'Connection to github.com timed out. (connect timeout=15)'))

My guess is that the started cluster may not be connected to the public network, so the package cannot be installed.

So I added the parameter --public-ip-address according to the parameter help information of gsutil, which solved the appeal problem.

To sum up, are there any omissions in the tutorial?

Best regards, Wenjie

mike-w-wilson commented 2 months ago

Hi @wjwei-handsome!

We appreciate you pointing this out! With our small team and the versioning differences of Hail and Dataproc, keeping the tutorial up-to-date and relevant for every version is challenging, i.e. I expect hail to incorporate the --public-ip-address into their hailctl wrapper of gcloud dataproc clusters create in their next version. However, clarity is important and we hope this note provides enough of it to help users find the right resources.

Thank you!