Read data not loading - Githubissues

broadinstitute / seqr

web-based analysis tool for rare disease genomics

GNU Affero General Public License v3.0

176 stars 88 forks source link

Read data not loading #4320

Open Caustint opened 3 weeks ago

Caustint commented 3 weeks ago

Describe the bug When clicking "show reads", the IGV header appears but the variant coordinates are not automatically populated in the search field. Even if I manually enter coordinates, read data does not load

Link to page(s) where bug is occurring Example variants: https://seqr.broadinstitute.org/project/R0384_rare_genomes_project_gen/saved_variants/variant/SV0062885_x77618912_f027715_rg https://seqr.broadinstitute.org/project/R0449_pathways_mgh/saved_variants/variant/SV0003680_2191859931_r0449_pat

Scope of the bug So far I've checked multiple cases in both RGP projects and a Pathways project, all seem to have the same issue.

Screenshots

monkollek commented 2 weeks ago

I had the same issue. The problem is that s3 url NOW does not allow request from things that are not browsers probably because it was getting hammered or something recently.

For example:

wget -d https://s3.amazonaws.com/igv.org.genomes/hg38/annotations/cytoBandIdeo.txt.gz

does not work. But this path works if you put it in your browser.

So if you put this in:

wget -d --user-agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:21.0) Gecko/20100101 Firefox/21.0"

It works!

So if you go to this line in the code: https://github.com/broadinstitute/seqr/blob/master/seqr/views/apis/igv_api.py#L271

Put in:

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36'}

And you are good to go. This is just a work around but it is rejecting things it doesn't consider a browser probably for security reasons. This was just a very recent and annoying restriction I noticed.

hanars commented 2 weeks ago

Yeah I fixed it similarly by just proxying through all the headers, but good to know! https://github.com/broadinstitute/seqr/pull/4325

hanars commented 1 week ago

Attempted fix did not work, will continue to investigate

jklugherz commented 1 week ago

A summary of what I've seen so far during my investigation:

Earlier today, I was able to hit https://s3.amazonaws.com/igv.org.genomes/hg38/annotations/cytoBandIdeo.txt.gz successfully from my locally running seqr. Now, I am seeing AccessDenied. This is confusing.

In dev, the http response is

<Error>
    <Code>NoSuchBucket</Code>
    <Message>The specified bucket does not exist</Message>
    <BucketName>seqr-dev.broadinstitute.org</BucketName>
    <RequestId>7Q33VHJAD2NS7MGZ</RequestId>
    <HostId>bi45rkHyNXjBLyvPBEtlqXdENgimjPpkYJKE1ZAcv/REa3TjgNU8dnX8HAjXf1ySAxJsz6QKfQY=</HostId>
</Error>

and in prod the same error but BucketName is seqr.broadinstitute.org.

This is also quite perplexing because the bucket in the file path is igv.org.genomes, not seqr.broadinstitute.org.

jklugherz commented 1 week ago

it looks like the igv team is investigating the AWS bucket access issues: https://github.com/igvteam/igv/issues/1556#issuecomment-2327372413

jrobinso commented 1 week ago

I've moved our dependency on this particular bucket, which is now blocked by our institution, for the for hg38, hg19, and T2T genomes. I'm working through others. I don't know how you app is configured but if you are pulling genomes from https://igv.org/genomes/genomes.json you will get the latest.

Our hosted bucket data is only there for use by IGV (and embedded igv.js). We pay for every byte downloaded. Normal IGV use cases should not present a problem but for sure we can't support uncontrolled scripted downloads. If we can't figure out a way to protect from that we might have to cease providing this data, and ask IGV users to host their own data.

jrobinso commented 1 week ago

BTW, this is not true. All our hosted resources set CORS headers. Otherwise our own app (igv.org/app) would not work.

# IGV does not properly set CORS header and cannot directly access the genomes resource from the browser without
    # using this server-side proxy