Closed cmdcolin closed 1 year ago
Ref #281
I've been pretty deep on this issue for a couple weeks now, and my advice is that you should be careful with this.
The previous GC content plugin for JBrowse1 counted ambiguous nucleotides (N) in the denominator and also may have had an off-by-one error. Obviously I don't think any code that ships with JBrowse2 is going to have an off-by-one problem, but I do want to emphasize that there is value in that calculation being either dynamic or very configurable. Different people are going to want different counting behavior for uppercase vs. lowercase or whether or not N's are considered.
The plugin I've been working on is a general dynamic Nucleotide Content feature that is highly configurable for ATGC vs atgc vs N, and then also for average vs. skew.
So, my recommendation here is that even if the plugin cannot be dynamic, we should try to build some indexing solution so that the calculation itself can be dynamic based on the needs of different people in different labs.
Definitely good to note. I'd like to have a "blessed" gc plotting feature, so definitely want to make it as accurate and flexible as possible. We have the early version of the GCContentAdapter on our master branch now, if you want to see about that...would be great to get feedback. Demo link http://s3.amazonaws.com/jbrowse.org/code/jb2/master/index.html?config=test_data%2Fconfig_demo.json&session=share-hjXAmPV8iX&password=D4dl3
Some of those features like windowsize, counting Ns or lower case, are not on master, but would be happy to incorporate any changes
I took a stab at extending the GC Content plugin, and it has been useful so far for genome visualization in my lab.
The key things I added:
Customizable counting by regex is quite useful because it's possible for users to completely control the behavior. As well, it allows enhanced plotting of soft-masked assemblies (e.g. plotting repeat density by customizing the regex to count lowercase).
I have attached the source for my version to this post. It works well enough for my lab's purposes right now, but would need to be cleaned up, have tests added, and be made more user friendly before it could be properly released.
There's a demo here: https://degeneratestrategy.com/nuccontent/web/
@jjrozewicki that is not only awesome but also groundbreaking :) I believe it's the first third party jbrowse 2 plugin I've seen! great work
also really good reference implementation for both gc content and repeat density
It seems like it would be useful to automatically create a GC content for a reference sequence
I thought perhaps we could synthesize one automatically, but this might be difficult. Actually showTrack doesn't like auto-generated track configs that aren't actually part of the tree because it uses resolveIdentifier
My thinking alternatively is instead of auto-generating it is we could add a convenience function to the CLI, either add-assembly or add-track, that automatically makes a GC content track