broadinstitute / gnomad-browser

Explore gnomAD datasets on the web
https://gnomad.broadinstitute.org
MIT License
80 stars 44 forks source link

Missing method when running `data-pipeline/caids/get_caids.py` #1620

Open ignatiusm opened 2 months ago

ignatiusm commented 2 months ago

I'm trying to run data-pipeline/caids/get_caids.py with a different dataset, but am encountering an issue.

When running lines 139 to 147 of the script (all_part_urls, and completed_part_urls), Hail errors out with the message:

    await f.url() async for f in await fs.listfiles(sharded_vcf_url) if f.name().startswith("part-")
                                                                        ^^^^^^
AttributeError: 'GoogleStorageFileListEntry' object has no attribute 'name'. Did you mean: '_name'?

Looking at the hail source code on line 483 of hail/python/hailtop/aiocloud/aiogoogle/client/storage_client.py I can see that the GoogleStorageFileListEntry class indeed does not have an async name method.

It seems like you were able to run these scripts to create the gnomad_v4 version of CAIDS data set earlier this year. I note this PR where there were updates to the get_caids.py script (and mentions that there have been "a number of Hail utils that have either been changed, removed or replaced since its last update"). I'd be grateful for any suggestions you have for addressing this 😃

I'm using GCP infrastructure with python v3.11 and hail v0.2.132.

ignatiusm commented 2 months ago

Ah! Found the fix 🥳 I'll make a PR now 😄

rileyhgrant commented 2 months ago

Heya @ignatiusm, hope things are well!

Thanks for filing this issue, and for finding a fix for it yourself! We currently use a prior version of hail (0.2.127), we found a workaround to pin this in our pipeline dependencies (our convenience deployment tool wrapped hailctl which ships with hail, leading to the need for a slight workaround).

It appears as though this particular issue was introduced at some point between hail v0.2.127, and hail v0.2.132. As such, we will merge and accept your contribution sometime down the line when we bump the hail version in our pipeline.

Let me know if that seems reasonable, and thanks again! :)

ignatiusm commented 2 months ago

Hi @rileyhgrant 👋 - that sounds super sensible. No worries from me :)