GMOD / jbrowse-components

Source code for JBrowse 2, a modern React-based genome browser
https://jbrowse.org/jb2
Apache License 2.0
206 stars 62 forks source link

Create stranded SNPCoverage view #2833

Open cmdcolin opened 2 years ago

cmdcolin commented 2 years ago

from thread here https://twitter.com/gringene_bio/status/1505780926734774275

would create separate coverage for positive and negative strand reads

cmdcolin commented 2 years ago

could also think about multibigwig/strandedplot https://github.com/bhofmei/jbplugin-strandedplot style results but this request is more about automatically calculating from the reads themselves

cmdcolin commented 2 years ago

example image from twitter thread image

gringer commented 2 years ago

For showing the overall coverage, I expect this would involve adding an additional fwd/reverse feature filter:

https://github.com/GMOD/jbrowse-components/blob/fc601154b770b299d966dae8a858081981c5316c/plugins/alignments/src/SNPCoverageRenderer/SNPCoverageRenderer.ts#L119-L132

My guess:

const feats = [...features.values()]
const coverage_fwd = feats.filter( f => (f.get('type') !== 'skip') && (f.get('strand') == 1) )
const coverage_rev = feats.filter( f => (f.get('type') !== 'skip') && (f.get('strand') == -1) )
const skips_fwd = feats.filter( (f => f.get('type') === 'skip')  && (f.get('strand') == 1) )
const skips_rev = feats.filter( (f => f.get('type') === 'skip')  && (f.get('strand') == -1) )

// Use two pass rendering, which helps in visualizing the SNPs at higher
// bpPerPx First pass: draw the gray background
ctx.fillStyle = colorForBase.total
// forward reads (shown above the Y axis)
for (let i = 0; i < coverage_fwd.length; i++) {
  const feature = coverage_fwd[i]
  const [leftPx, rightPx] = featureSpanPx(feature, region, bpPerPx)
  const w = rightPx - leftPx + 0.3
  const score = feature.get('score') as number
  ctx.fillRect(leftPx, toY(score), w, toHeight(score))
}
// reverse reads (shown below the Y axis)
for (let i = 0; i < coverage_rev.length; i++) {
  const feature = coverage_rev[i]
  const [leftPx, rightPx] = featureSpanPx(feature, region, bpPerPx)
  const w = rightPx - leftPx + 0.3
  const score = feature.get('score') as number
  ctx.fillRect(leftPx, toY(score), w, toY - toHeight(score))
}

[if the filtered coverage tracks can be assumed to be the same length, then these rectangles could be done within the same loop]

gringer commented 2 years ago

Here is a reference assembly and BAM file that can be used for testing (the same one that I used in the JBrowse2 picture). This BAM file is useful because it shows stranded long reads with a very high dynamic range in coverage, so the log representation is necessary to fully appreciate the expression pattern:

GRCm39.primary_assembly.chrM.fa.gz

mm2_GCFeb22_S14_called_BC07_vs_chrM.bam.gz

edit: updated BAM file

gringer commented 2 years ago

Should look something like this:

Screen Shot 2022-03-25 at 19 04 33

I needed to dig down to the SNPCoverageAdapter bin function to create additional scores for forward and reverse strand, in order to get this working. It's also a bastardisation of the scale (as you may be able to tell by looking at the Y axis).

... although it does seem to work well on log scale as well:

Screen Shot 2022-03-25 at 19 10 05
cmdcolin commented 2 years ago

great initiative diving into the codebase:)! could possibly use negative values but those will not log transform well...will probably need a real notion of having multiple signals in a single track a la multibigwig and https://github.com/bhofmei/jbplugin-strandedplot

gringer commented 2 years ago

I've attached a diff (based on commit 85973e53). This creates a new StrandedRelCoverage Display Type and associated adapters (based on LinearSNPCoverage), which has a stranded coverage track and stranded variant display.

I suppose this would be better designed as a modification of the existing LinearSNPCoverage track, but I didn't want to break existing functionality when creating something different.

patch-stranded-snp-coverage.diff.gz

Screen Shot 2022-03-26 at 16 51 33

Note that in the highlighted column, the 'C' variants are broken by an 'A' variant in the middle on the opposite strand. This could be improved (e.g. by sorting variants based on their frequency), but I think this is at least good enough for others to understand what I'd like to achieve.

garrettjstevens commented 1 year ago

Another +1 for this from @haessar today. He said, I believe, that he uses differences in coverage between strands to help with annotating UTRs. Here's a screenshot of this feature Artemis:

image

cmdcolin commented 1 year ago

definitely a good use case there. could possibly leverage multiwiggle display for this purpose (as they are conceptually two different signals)