Open rbuels opened 5 years ago
Second part of this is: make error handlers in JBrowse HTTP request code emit a warning and retry the request through the proxy if there seem to be CORS problems.
We would like to complain about this to the user, so that people can fix their server CORS configurations, but people can still get to the data in the meantime.
It might even be possible for the central proxy to send automated (but not obviously automated) emails to server administrators bugging them to fix cors.
Let's initially try to do the hacked-up FTP range request scheme described in that link, and see how it goes.
can have a look at what UCSC does for ftp stuff: https://github.com/ucscGenomeBrowser/kent/blob/master/src/htslib/knetfile.c
I wanted to test a file without transfering it to a cors-haven so I tried cors-anywhere open proxy and self hosting a cors-anywhere instance myself and in some cases, it can work, some cases it will fail, and reasons remain unknown Before engineering a large system around this, I think it's helpful probably to understand where these failure cases can arise
<html>
<script>
(async () => {
const res = await fetch('https://cors-anywhere.herokuapp.com/http://jbrowse.org/code/JBrowse-1.16.6/docs/tutorial/data_files/volvox-sorted.bam', {headers: {range: 'bytes=0-10'}})
const t = await res.arrayBuffer()
console.log('volvox-arraybuffer',t)
const res2 = await fetch('https://cors-anywhere.herokuapp.com/https://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/HG002_NA24385_son/PacBio_CCS_15kb/alignment/HG002.Sequel.15kb.pbmm2.hs37d5.whatshap.haplotag.RTG.10x.trio.bam',{headers: {range: 'bytes=0-10'}})
console.log('pacbio-ncbi-corsanywhere request',res2)
const t2 = await res2.arrayBuffer()
console.log('pacbio-ncbi-corsanywhere arraybuffer',t2)
})()
</script><body>Hello</body></html>
Resulting output
volvox-arraybuffer ArrayBuffer(11) {} <-- works for the volvox file
HG002.Sequel.15kb.pbmm2.hs37d5.whatshap.haplotag.RTG.10x.trio.bam:1 GET https://cors-anywhere.herokuapp.com/https://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/HG002_NA24385_son/PacBio_CCS_15kb/alignment/HG002.Sequel.15kb.pbmm2.hs37d5.whatshap.haplotag.RTG.10x.trio.bam net::ERR_CONTENT_DECODING_FAILED
test.html:16 Uncaught (in promise) TypeError: Failed to fetch
async function (async)
(anonymous) @ test.html:4
(anonymous) @ test.html:16
I bet the CONTENT_DECODING_FAILED thing is caused by something in cors-anywhere incorrectly detecting the bam file as gzipped and trying to decompress it somewhere along the way.
This CORS issues will be worse with jbrowse being hosted on https because it cannot access http resources which many trackhubs use.
Random tidbit
For some files cors-anywhere works fine
E.g.
cors disabled on this s3 bucket, so this fails https://s3.amazonaws.com/nanopore-human-wgs/chm13/assemblies/alignments/chm13.draft_v1.0.hifi.bam
adding cors-anywhere proxy makes it work https://cors-anywhere.herokuapp.com/https://s3.amazonaws.com/nanopore-human-wgs/chm13/assemblies/alignments/chm13.draft_v1.0.hifi.bam
of course throwing a lot of big data through that is somewhat abusive to their free service but their thing is open source, can be rehosted
cors-proxy appears to be failing for something where it worked before, namely ucsc api
this url currently produces application error
I was hoping to test out cors-anywhere using the same host (https://cors-anywhere.herokuapp.com/) and it sort of works and I have more information. The first item is that the developer running this server has set it up such the it will only temporarily unlock if you visit the site before and push a button to request access (basically, to allow testing as a dev). I did that and then tried to fetch some NCList data:
https://cors-anywhere.herokuapp.com/http://jbrowse.informatics.jax.org/data/mouse/tracks/MGI_Genome_Features/{refseq}/trackData.json
But it fails. I tested fetching this with curl after adding required headers and supplying a MGI chromosome name (Alliance uses "1" whereas MGI uses "chr1"):
curl -H 'X-Requested-With: XMLHttpRequest' -O https://cors-anywhere.herokuapp.com/http://jbrowse.informatics.jax.org/data/mouse/tracks/MGI_Genome_Features/chr1/trackData.json
which is successful. I double checked that there is an alias file for mouse chromosomes but it still fails. If I hard code the "chr" to the url though, it works:
"uri": "https://cors-anywhere.herokuapp.com/http://jbrowse.informatics.jax.org/data/mouse/tracks/MGI_Genome_Features/chr{refseq}/trackData.json"
The reason for having to do this probably has to do with JBrowse retrying but getting rejected by the proxy. Also, since this is using the public proxy, it won't work generally but does work on my temporarily whitelisted computer.
This, at least, seems like enough to trying getting a server of my own going that uses https. Note that the MGI JBrowse instance uses http, so even if they enabled cors, I wouldn't be able to use it because it would through a security error (since we use https).
Let's make a central REST API running at https://jbrow.se/proxy that all JBrowses can use to sidestep CORS incompatibilities and to access data that is only available over FTP.
Implementation Notes
X-*
request headers as the primary way of communicating side-band stuff like analytics.##gff-version 3
is a magic number, so is##fileformat=VCF...
) or other deeper data validation if file has no magic number (e.g. fai)Architecture
Version 1 (the stupid thing/MVP)
Version 2 (the scalable thing)
API draft
GET /v1/{url}
Returns data exactly as if the browser had requested that file from the given URL. The url does not even have to be escaped additionally at all,
https://jbrow.se/proxy/v1/http://terrible.org/some/file/someplace.bw
is valid.