LooseLab / readfish

CLI tool for flexible and fast adaptive sampling on ONT sequencers
https://looselab.github.io/readfish/
GNU General Public License v3.0
163 stars 31 forks source link

readfish stats on-target median read length and N50 always 0B #334

Closed maximilianmordig closed 3 months ago

maximilianmordig commented 4 months ago

I think there is an error in the table output by readfish stats which also appears in the table in the README.md of this repo. When enriching for chr20,21, the median read length and N50 seem to be reported in the off-target rather than the on-target columns (the entire on-target colum is 0b).

Moreover, can you explain how you define on- and off-target in case of multiple alignments to the target and off the target? If a read aligns to both the target and off the target, do you count it in both columns of Alignments.On-Target and Alignments.Off-Target? Is the estimated coverage for the whole contig? It may make sense to also split by on- and off-target, similarly to the other columns.

github-actions[bot] commented 4 months ago

Thank you for your issue. Give us a little time to review it.

PS. You might want to check the FAQ if you haven't done so already.

This is an automated reply, generated by FAQtory

mattloose commented 4 months ago

Hi,

Thanks for trying out the tool.

I agree the presentation of the on target and off target data in the table is a little confusing in the case where a whole chromosome is presented. In this example there is no on target and off target with respect to chromosomes 20 and 21 as the whole chromosome is the target. Thus the numbers (although incorrectly shown in the off target column) are correct - I will flag this for checking though (@Adoni5) .

Moreover, can you explain how you define on- and off-target in case of multiple alignments to the target and off the target? If a read aligns to both the target and off the target, do you count it in both columns of Alignments.On-Target and Alignments.Off-Target?

So this can be answered and is based on the decision that should be made derived from the toml file used to configure the experiment. Reads will be assigned to on target and off target following the logic defined in there - so "on target" should equate to "stop_receiving" and "off target" should be "unblock". You can choose how such mappings are handled in the readfish toml (multi_on and multi_off) - see https://looselab.github.io/readfish/toml.html#regions-sub-tables for more information.

The idea behind readfish stats is to evaluate the experiment in the context of the original analysis set up.

I hope that helps.

github-actions[bot] commented 3 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] commented 3 months ago

This issue was closed because there has been no response for 5 days after becoming stale.