GMOD / jbrowse-components

Source code for JBrowse 2, a modern React-based genome browser
https://jbrowse.org/jb2
Apache License 2.0
208 stars 62 forks source link

Optimizations to support large tracklists #4499

Open cmdcolin opened 3 months ago

cmdcolin commented 3 months ago

This is a re-opening of the frozen tracks PR

It will be a challenging PR to get fully merged but interested users can try the branch out

Two optimizations were added to frozen_tracks4 today including (a) removing the clone module, which was accidentally quadratic and introduced big slowdowns after around 64000 tracks and (b) a change to the generateHierarchy function for the track selector to make it faster

cc @Maarten-vd-Sande

this branch (frozen_tracks4)

2.json  0.956s total
4.json  0.944s total
8.json  0.944s total
16.json  0.949s total
32.json  0.960s total
64.json  0.959s total
128.json  0.923s total
256.json  0.905s total
512.json  0.941s total
1024.json  0.934s total
2048.json  0.911s total
4096.json  0.995s total
8192.json  1.069s total
16384.json  1.097s total
32768.json  1.183s total
65536.json  1.463s total
131072.json  2.021s total
262144.json  3.212s total

before today on frozen_tracks4

2.json  4.234s total
4.json  0.935s total
8.json  0.917s total
16.json  0.931s total
32.json  0.987s total
64.json  0.918s total
128.json  0.943s total
256.json  0.904s total
512.json  0.948s total
1024.json  0.991s total
2048.json  0.960s total
4096.json  1.079s total
8192.json  1.369s total
16384.json  2.578s total
32768.json  7.298s total
65536.json  25.797s total
131072.json  1m34.86s total
...timeout at 262144...

main branch

2.json  0.935s total
4.json  1.044s total
8.json  1.021s total
16.json  1.006s total
32.json  1.086s total
64.json  1.216s total
128.json  1.491s total
256.json  1.981s total
512.json  2.828s total
1024.json  4.876s total
2048.json  8.627s total
4096.json  16.225s total
8192.json  42.861s total
...timeout at 16384...

some code for testing https://github.com/cmdcolin/jb2-large-tracklist-profiling

Maarten-vd-Sande commented 3 months ago

While the initial load is very fast :partying_face: , it introduced a bug loading tracks:

Error: HTTP 404 fetching data/dm61_FS_ampliconset-dec-2023_800nt.ampliconref/alignments/24-FS-13/P001__WB09__24-FS-13__24MB03939-1034_64a6acfdfd.cram bytes 0-131071
../../../packages/core/util/io/RemoteFileWithRangeCache.ts:94:13 (fetchBinaryRange@)

JBrowse 2.13.0

It even happens with the test data:

Error: HTTP 404 fetching volvox.filtered.vcf.gz.tbi
../../../node_modules/generic-filehandle/src/remoteFile.ts:171:13 (readFile@)

JBrowse 2.13.0
cmdcolin commented 3 months ago

interesting. i'm not sure if i fixed it but i pushed another change just now that could potentially help...can try to refetch the branch potentially with jbrowse upgrade --branch frozen_tracks4 again...

Maarten-vd-Sande commented 3 months ago

Yes that seems to work! :pray: