Open Caustint opened 3 weeks ago
I had the same issue. The problem is that s3 url NOW does not allow request from things that are not browsers probably because it was getting hammered or something recently.
For example:
wget -d https://s3.amazonaws.com/igv.org.genomes/hg38/annotations/cytoBandIdeo.txt.gz
does not work. But this path works if you put it in your browser.
So if you put this in:
wget -d --user-agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:21.0) Gecko/20100101 Firefox/21.0"
It works!
So if you go to this line in the code: https://github.com/broadinstitute/seqr/blob/master/seqr/views/apis/igv_api.py#L271
Put in:
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36'}
And you are good to go. This is just a work around but it is rejecting things it doesn't consider a browser probably for security reasons. This was just a very recent and annoying restriction I noticed.
Yeah I fixed it similarly by just proxying through all the headers, but good to know! https://github.com/broadinstitute/seqr/pull/4325
Attempted fix did not work, will continue to investigate
A summary of what I've seen so far during my investigation:
Earlier today, I was able to hit https://s3.amazonaws.com/igv.org.genomes/hg38/annotations/cytoBandIdeo.txt.gz successfully from my locally running seqr. Now, I am seeing AccessDenied
. This is confusing.
In dev, the http response is
<Error>
<Code>NoSuchBucket</Code>
<Message>The specified bucket does not exist</Message>
<BucketName>seqr-dev.broadinstitute.org</BucketName>
<RequestId>7Q33VHJAD2NS7MGZ</RequestId>
<HostId>bi45rkHyNXjBLyvPBEtlqXdENgimjPpkYJKE1ZAcv/REa3TjgNU8dnX8HAjXf1ySAxJsz6QKfQY=</HostId>
</Error>
and in prod the same error but BucketName
is seqr.broadinstitute.org
.
This is also quite perplexing because the bucket in the file path is igv.org.genomes, not seqr.broadinstitute.org.
it looks like the igv team is investigating the AWS bucket access issues: https://github.com/igvteam/igv/issues/1556#issuecomment-2327372413
I've moved our dependency on this particular bucket, which is now blocked by our institution, for the for hg38, hg19, and T2T genomes. I'm working through others. I don't know how you app is configured but if you are pulling genomes from https://igv.org/genomes/genomes.json you will get the latest.
Our hosted bucket data is only there for use by IGV (and embedded igv.js). We pay for every byte downloaded. Normal IGV use cases should not present a problem but for sure we can't support uncontrolled scripted downloads. If we can't figure out a way to protect from that we might have to cease providing this data, and ask IGV users to host their own data.
BTW, this is not true. All our hosted resources set CORS headers. Otherwise our own app (igv.org/app) would not work.
# IGV does not properly set CORS header and cannot directly access the genomes resource from the browser without
# using this server-side proxy
Describe the bug When clicking "show reads", the IGV header appears but the variant coordinates are not automatically populated in the search field. Even if I manually enter coordinates, read data does not load
Link to page(s) where bug is occurring Example variants: https://seqr.broadinstitute.org/project/R0384_rare_genomes_project_gen/saved_variants/variant/SV0062885_x77618912_f027715_rg https://seqr.broadinstitute.org/project/R0449_pathways_mgh/saved_variants/variant/SV0003680_2191859931_r0449_pat
Scope of the bug So far I've checked multiple cases in both RGP projects and a Pathways project, all seem to have the same issue.
Screenshots