Sequencing summary files for human genome experiment

LooseLab / readfish

CLI tool for flexible and fast adaptive sampling on ONT sequencers

https://looselab.github.io/readfish/

GNU General Public License v3.0

169 stars 33 forks source link

Sequencing summary files for human genome experiment #322

Closed maximilianmordig closed 7 months ago

maximilianmordig commented 9 months ago

Thank you for your ReadFish paper, it was a nice read.

Looking at the ENA record (https://www.ebi.ac.uk/ena/browser/view/PRJEB36644), it seems only the fastq data is available. Could you share the sequencing summary files for the human-genome experiment in Figure 1 in the paper? I am interested in the read timings (channel, read_id, read start time/end time) after mux scan removal (similar to UNCALLED's mux scan removal). Which channels are responsible for which condition (control, 50% etc.)? Is the yield in Figure 1c the ratio of the number of reads or of the number of basepairs (of the readfish and the control run)?

github-actions[bot] commented 9 months ago

Thank you for your issue. Give us a little time to review it.

PS. You might want to check the FAQ if you haven't done so already.

This is an automated reply, generated by FAQtory

mattloose commented 9 months ago

Hi,

Thanks for your interest.

I can get you those data but they are on R9 which is now somewhat outdated and so I am not sure what real value the data would have for you? The behaviour on R10.4.1 and with the high capture adapters are going to change those capture times.

The flowcell is divided into quadrants and you can infer the channels involved from this function: https://github.com/LooseLab/readfish/blob/caacb02b291ed1a715de1eec1c2951dd12bea70f/src/readfish/_utils.py#L330

Yield is always calculated in bases. It makes no sense to count number of reads in a platform with variable read length. The yield calculations include all data from the flowcell (it is not filtered by pass/fail or any other metric) and so represents the true benefit over background. Flushing and reloading the flowcell will improve the performance further.

If you want the R9 sequencing summary files I will see if I can dig them out for you.

maximilianmordig commented 9 months ago

Hi Thank you for your fast response. I would be interested in the data to look more at the read timings and replicate the figures to go on from there, so R9.4 is fine. Also, that means layout = generate_flowcell(512, split=4) and layout[0] are the control channels, layout[1] is 50%, layout[2] is 25% and layout[3] is 12.5%?

Thank you in advance.

mattloose commented 9 months ago

Also, that means layout = generate_flowcell(512, split=4) and layout[0] are the control channels, layout[1] is 50%, layout[2] is 25% and layout[3] is 12.5%?

This is correct!

I'll dig out the timings etc.

maximilianmordig commented 8 months ago

Hi @mattloose Did you manage to find the files?

Adoni5 commented 8 months ago

hi @maximilianmordig - Matts out of the country right now, he'll get back to you! I'd look for them but it'll be a lot more efficient if Matt does, as he knows where they are

github-actions[bot] commented 7 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] commented 7 months ago

This issue was closed because there has been no response for 5 days after becoming stale.