GMOD / jbrowse-components

Source code for JBrowse 2, a modern React-based genome browser
https://jbrowse.org/jb2
Apache License 2.0
208 stars 62 forks source link

Add aggregation to BigBedAdapter to group bigGenePred transcripts #4456

Closed cmdcolin closed 5 months ago

cmdcolin commented 5 months ago

This adds the ability to aggregate multiple bigbed entries into a single 'gene' feature based on a attribute such as geneName

BigBed /BED in general is only capable of storing one transcript per line and doesnt explicitly acknowledge child->parent relationships with a gene level feature

result on volvox

before image

this PR image

motivation: better UCSC2jbrowse mega-instance functionality

cmdcolin commented 5 months ago

uses a redispatching approach similar to gff3tabix to handle cases where you retrieve a child and need to re-fetch to get all children within it's bounds. it's a heuristic that could be broken but hopefully holds up for general usage

cmdcolin commented 5 months ago

on the UCSC Gencode bigGenePred file, i think it has much nicer default behavior

this branch image

main branch image

note that a similar thing could be done to bedTabix potentially, but the UCSC bedTabix that i exported from the sql tables don't have any geneName type attribute to aggregate by afaik