dasmoth / dalliance

Interactive web-based genome browser.
http://www.biodalliance.org/
BSD 2-Clause "Simplified" License
226 stars 68 forks source link

Bug report: Inconsistencies in Data Fetching/Rendering with Hg18ToHg19 Chainset #81

Closed ymen closed 10 years ago

ymen commented 10 years ago

Hi Thomas,

Thanks for the tip and update on using Chains for Coordinate mapping.

We were trying to get a working example of the coordinate mapping system, and observed some inconsistencies in data fetching, where lifted-over features "disappear" from the view after scrolling/zooming.

Triggering the bug

chr17   38530000    38530050    1
chr17   38530500    38530550    2
chr17   38529500    38529550    3

screen shot 2014-07-22 at 5 23 44 pm

screen shot 2014-07-22 at 5 28 06 pm

There're no obvious exceptions that Dalliance reports.

Other Observations

I also tried with the mapping file I used previously https://groups.google.com/forum/#!topic/biodalliance-dev/6yBxQBUO2TA (which lifts the entire Chr 21 onto Chr 22). This seems to work reliably (e.g. when I apply this mapping file to the GeneCode 19 track), and did not have the issue with disappearing features.

Our hunch is that it might be down to the fetching range after coordinate mapping, but we haven't really looked into the code for chainsets so not really sure what might be the problem. I'm also not sure if it might be an error on the part of the DAS server, which the hg18ToHg19 chain uses (seems like DAS chains are deprecated?). I'd be happy to help investigate the issue further if you need any help or have any pointers on how to approach the problem!

cc @mdrasmus

dasmoth commented 10 years ago

Thanks very much for the detailed report, was very helpful in tracking this down.

It turns out that a key part of the problem is that the data was coming from a (textual) BED file, rather than any of the other back-ends. The "memstore" backend used for textual BEDs, VCFs, etc, is unusual in that it can return the same feature object multiple times, rather than creating them anew if you re-fetch the same region. The mapping code (which pre-dates the memstore backend) was relying on feature objects not being re-used, and was therefore overwriting their min and max fields.

Now fixed by copying feature objects before mapping them.

ymen commented 10 years ago

Thanks for the prompt response Thomas! :D