GMOD / jbrowse

JBrowse 1, a full-featured genome browser built with JavaScript and HTML5. For JBrowse 2, see https://github.com/GMOD/jbrowse-components.
http://jbrowse.org
Other
464 stars 199 forks source link

error loading bigwig (and other?) files from remote source #1356

Open trstickland opened 5 years ago

trstickland commented 5 years ago

I have found the following error when trying to load bigwig files. I have no reason to think it's specific to BigWig files, but I haven't yet had the time to try others.

Here is an example file: https://fungidb.org/a/service/jbrowse/store?data=AnigerCBS513-88/bigwig/anigCBS513-88_Archer_LignocelluloseResponses_rnaSeq_RSRC/1_Straw_24h/non_unique_results.secondstrandCombinedReps_unlogged.bw

I can download that (with curl or whatever) and then load it into jbrowse from my desktop machine, and there's no problem. The problem occurs when it's loaded from a remote web server, either entering the above URL into the "open track file" dialog, or putting the URL into the trackList.json file (the actual usage IRL).

I see that jbrowse appears to load data in chunks of 256k. The fungidb.org server is configured for CORS and supports range, so the initial request works fine, where I see jbrowse sends the header

range: bytes=0-262143

and it receives a response including these headers, which I believe are correct:

Content-Length: 262144
Content-Range: bytes 0-262143/351451

Then jbrowse sends another request with...

range: bytes=262144-524287

Now, the file is 351451B so the range in that request is too high, and the server responds with a 400 (bad request). So there's 90KB of data missing, and an error in the console.

There's also a more serious version of this issue where the file size is under 256KB, and in that case the initial request (using a 'range: bytes=0-262143' header) get's an immediate 400. There are no data to display and the jbrowse panel contains a big red box with the HTPP error message in place of the track content. If you want to see that, the URL is https://fungidb.org/a/service/jbrowse/store?data=AnigerCBS513-88/bigwig/anigCBS513-88_Archer_LignocelluloseResponses_rnaSeq_RSRC/1_Straw_24h/non_unique_results.firststrandCombinedReps_unlogged.bw

Thanks.

software versions: jbrowse 1.16.2-release running within apollo 2.3.1 (tomcat 8.5.40, groovy 2.5.1, grails 2.5.5)

cmdcolin commented 5 years ago

both of these files appear to work when returned from my local instance of jbrowse with an nginx server. are there any variables in your setup that might be causing issues aside from this

localhost_jbrowse__data=an loc=An01_A_niger_CBS_513_88%3A2901108 3625813 tracks=fungidb%2Cfungidb2 highlight= (1)

cmdcolin commented 5 years ago

maybe for CORS you need to have access to Content-Range

'Access-Control-Allow-Headers' 'Range';
'Access-Control-Expose-Headers' 'Content-Length,Content-Range';
trstickland commented 5 years ago

Range headers are permitted, and the server accepts requests that use them and returns the expected response; in general it all works fine, it's only when jbrowse send a range header with a byte range that goes beyond the file size that there's a problem.

For the first file I mentioned, does your instance of jbrowse send two requests? The first one with a range header

range: bytes=0-262143

followed by a second request with

range: bytes=262144-524287

?

If that's happening, then we have the same client-side behaviour. It's possible your nginx server is more forgiving of range headers that ask for a range beyond the size of the file..?

If you're not seeing those range headers in the request, we must have different jbrowse behaviour...

cmdcolin commented 5 years ago

for the first range: bytes=262144-351450 - response Content-Range: bytes 262144-351450/351451 range: bytes=0-262143 - response Content-Range: bytes 0-262143/351451

for the second range: bytes=0-262143 - response Content-Range: bytes 0-212246/212247

trstickland commented 5 years ago

OK, if the second one loads OK then it looks as if your server behaves differently to fungidb.org (as the latter responds with a 400 when after for byte range 0-262143).

The first file is a bit odd. The response Content-Range: bytes 0-262143/351451 is the same as I see from fungidb.org, but the header in the next request is range: bytes=262144-351450, which indicates jbrowse has detected the size of the file and constructed the appropriate range header. My jbrowse doesn't do that :(

So it looks like we have differing behaviour on client and server sides. Oh joy :-/

I'm separately in contact with the guys at fungidb.org, I'll see what might be done server side...

cmdcolin commented 5 years ago

The client side behavior difference is actually a server side one because it's probably due to the client not having access to the content-range server response. Probably want to get access to the CORS Content-Range header e.g. `'Access-Control-Expose-Headers' 'Content-Range'; which is their server side config

trstickland commented 5 years ago

Nope, the server at fungidb is returning the content-length header, same as your nginx server...

On Fri, 26 Apr 2019, 20:23 Colin Diesh, notifications@github.com wrote:

The client side behavior difference is actually a server side one because it's probably due to the client not having access to the content-range server response. Probably want to get access to the CORS Content-Range header e.g. `'Access-Control-Expose-Headers' 'Content-Range'; which is their server side config

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/GMOD/jbrowse/issues/1356#issuecomment-487172494, or mute the thread https://github.com/notifications/unsubscribe-auth/AH75OYPWPGKHP7H5PJX6YN3PSNJBNANCNFSM4HIWUZ7A .

trstickland commented 5 years ago

Content-range, I meant

On Fri, 26 Apr 2019, 20:48 T R Stickland, t.r.stickland@googlemail.com wrote:

Nope, the server at fungidb is returning the content-length header, same as your nginx server...

On Fri, 26 Apr 2019, 20:23 Colin Diesh, notifications@github.com wrote:

The client side behavior difference is actually a server side one because it's probably due to the client not having access to the content-range server response. Probably want to get access to the CORS Content-Range header e.g. `'Access-Control-Expose-Headers' 'Content-Range'; which is their server side config

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/GMOD/jbrowse/issues/1356#issuecomment-487172494, or mute the thread https://github.com/notifications/unsubscribe-auth/AH75OYPWPGKHP7H5PJX6YN3PSNJBNANCNFSM4HIWUZ7A .

cmdcolin commented 5 years ago

If you are remotely accessing the resource via CORS then they need to have this header "exposed" so that the javascript can read this value

Even if you can see in the network log that this header is returned, it doesn't mean that the javascript is able to read this in a CORS scenario.

Here is a thread that is sort of similar but applies to a amazon s3 configuration https://sourceforge.net/p/gmod/mailman/message/36519712/

trstickland commented 5 years ago

Ah, thanks for the clarification! I will check this out on Monday, it could be where we're going wrong.

On Fri, 26 Apr 2019, 23:48 Colin Diesh, notifications@github.com wrote:

If you are remotely accessing the resource via CORS then they need to have this header "exposed" so that the javascript can read this value

Even if you can see in the network log that this header is returned, it doesn't mean that the javascript is able to read this in a CORS scenario.

Here is a thread that is sort of similar but applies to a amazon s3 configuration https://sourceforge.net/p/gmod/mailman/message/36519712/

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/GMOD/jbrowse/issues/1356#issuecomment-487224264, or mute the thread https://github.com/notifications/unsubscribe-auth/AH75OYK3KWBENOK7X75MUN3PSOBCBANCNFSM4HIWUZ7A .

trstickland commented 5 years ago

I've checked this, and an appropriate Access-Control-Expose-Headers header fixes the problem in the first case I mentioned -- where the track does load, but the second request gets an HTTP error response because the range header asked for an incorrect range.

As I suspected though, it doesn't fix the second case. That was for this file: https://fungidb.org/a/service/jbrowse/store?data=AnigerCBS513-88/bigwig/anigCBS513-88_Archer_LignocelluloseResponses_rnaSeq_RSRC/1_Straw_24h/non_unique_results.firststrandCombinedReps_unlogged.bw

In this case the whole file is under 256KB, so the very first request from jbrowse produces an HTTP error. As it's the first request from jbrowse, obviously it hasn't seen any responses yet, so it can't possibly have read a Content-range header prior to sending the request. :(

The interesting question here, is why your nginx doesn't respond with a 400, given that the range: bytes=0-262143 header in the request is invalid..?

cmdcolin commented 5 years ago

We might be able to fix this error for small bigwigs. We have something called an aggregating http range fetcher that tries to combine requests into 256 kb chunks but seems to not respect small files. I'll leave this issue open to see if we can fix that

cmdcolin commented 5 years ago

Note that the aggregating http range fetcher is sort of new so this bug could be a regression

rbuels commented 5 years ago

What web server software is in use at fungidb? If it is treating byte range requests beyond the end of the file as an error, it’s acting very differently from other web servers.

Are you guys sure that’s what it’s doing?

On Mon, Apr 29, 2019 at 8:02 AM Colin Diesh notifications@github.com wrote:

Note that the aggregating http range fetcher is sort of new so this bug could be a regression

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/GMOD/jbrowse/issues/1356#issuecomment-487615854, or mute the thread https://github.com/notifications/unsubscribe-auth/AAASAFOP6XNGPCGRGQ56SODPS4EYPANCNFSM4HIWUZ7A .

cmdcolin commented 5 years ago

Yes I might suggest checking with the web server implementation...it looks like it's a custom written thing from the url pattern

trstickland commented 5 years ago

Well I can tell you that the value in the 'range' header is beyond the end of file, and that the HTTP response is a 400 (bad request). That's not absolute proof, but it's what appears to be happening.

Oh, and a request without the 'range' header works fine.

On Mon, 29 Apr 2019, 18:34 Robert Buels, notifications@github.com wrote:

What web server software is in use at fungidb? If it is treating byte range requests beyond the end of the file as an error, it’s acting very differently from other web servers.

Are you guys sure that’s what it’s doing?

On Mon, Apr 29, 2019 at 8:02 AM Colin Diesh notifications@github.com wrote:

Note that the aggregating http range fetcher is sort of new so this bug could be a regression

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/GMOD/jbrowse/issues/1356#issuecomment-487615854, or mute the thread < https://github.com/notifications/unsubscribe-auth/AAASAFOP6XNGPCGRGQ56SODPS4EYPANCNFSM4HIWUZ7A

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/GMOD/jbrowse/issues/1356#issuecomment-487672519, or mute the thread https://github.com/notifications/unsubscribe-auth/AH75OYMG75W344MUBMBNDH3PS4WRRANCNFSM4HIWUZ7A .

cmdcolin commented 5 years ago

I understand that, but we have seen that most common servers including nginx, apache, and npm's http-server and others respond fine in this case.

cmdcolin commented 5 years ago

You also would not have needed the "expose header content-range" cors stuff if the server just responded without error to the beyond-the-file-length request.

cmdcolin commented 5 years ago

Simple test:

npm install -g http-server
http-server --cors #in the volvox data

then setup reference to http://localhost:8080/volvox_sine.bw in another browser (via cors or not) and it works and that file is less than 256kb

trstickland commented 5 years ago

Sorry my response was to Robert, I am on my phone so I am losing the order in the thread.

Yes,i understand what you are saying: if the web server was not (apparently) being strict about the range requested, exposing the additional headers wouldn't be required.

It's worth fixing the headers, though, whilst working on this issue; and it solves the problem for all but the smallest files. And it would seem churlish to ignore it, after you were good enough to find the problem :)

trstickland commented 5 years ago

BTW it's an nginx server, at least the front end - not sure what's on the back end.

On Mon, 29 Apr 2019, 20:13 T R Stickland, t.r.stickland@googlemail.com wrote:

Sorry my response was to Robert, I am on my phone so I am losing the order in the thread.

Yes,i understand what you are saying: if the web server was not (apparently) being strict about the range requested, exposing the additional headers wouldn't be required.

It's worth fixing the headers, though, whilst working on this issue; and it solves the problem for all but the smallest files. And it would seem churlish to ignore it, after you were good enough to find the problem :)

trstickland commented 5 years ago

Some reconfiguration at the server end did fix this, thanks.

I won't close this just yet as you mentioned a possible bug with the aggregating http range fetcher (?). But my issue is fixed, so do close it if you'd like to.

Thanks for the help & education about CORS: much appreciated!