New page for browser: Assess alignments

chanzuckerberg / shasta

[MOVED] Moved to paoloshasta/shasta. De novo assembly from Oxford Nanopore reads

Other

272 stars 59 forks source link

New page for browser: Assess alignments #170

Closed rlorigro closed 4 years ago

rlorigro commented 4 years ago

This is the first draft of a new page in the assembly browser which allows automated sampling of reads for aligning one-to-all. Bulk sampling enables stats on alignedFraction and alignedMarkerCount to be collected efficiently. In addition, the ratio of stored:found alignments is computed.

In the future I would like to add sampling from dead ends as an option, and include more data about alignments overhangs found in stored vs computed alignments. This is also a first pass at deciding whether alignments thresholds can be automatically determined at run time.

Its not clear to me where the merge conflict is coming from. I can work that out if needed.

Some example screenshots below:

rlorigro commented 4 years ago

last push was a rebase

bagashe commented 4 years ago

This is very cool. I will do a detailed review later tonight.

Can we compute this histogram during a regular Shasta run and dump it in the ShastaRun directory? I could use it in the feedback script for automated feedback.

rlorigro commented 4 years ago

Yeah you could totally produce something like this from the stored alignments. The difference will be that the stored alignments are only calculated from the subset of reads that were paired by the LowHash algorithm, and the distributions will have a hard cutoff at alignedFraction = 0.4 and markerCount = 200 or whatever your configuration is set to.

We can figure out the details in another PR

paoloczi commented 4 years ago

A couple more comments on the page.

It should have a title (<h1>). Something like "Alignment statistics"? And perhaps a blurb explaining what it does.
What is the meaning of the minimum and maximum read lengths? There are three possibilities: bases of raw sequence (the original read), bases of RLE sequence, or markers.

rlorigro commented 4 years ago

Reverted boost accumulator edits because it did not have the option for specifying min/max, and it appeared to be failing for decimals between 0 and 1