esteinig / nanoq

Minimal but speedy quality control for nanopore reads in Rust :bear:
MIT License
109 stars 9 forks source link

quality report vs length report #45

Open Hedi65 opened 2 months ago

Hedi65 commented 2 months ago

Dear community

I got the full report for read length and quality using the following command lines

nanoq -i my_file.fastq -s -Q report_quality.txt nanoq -i my_file.fastq -s -L report_length.txt

I can see that in both reports, the number of lines corresponds to the number of reads in my file and that means average quality and length for each read, written into one line in the report files.

the report is sorted in ascending order. can I assume that for instance, the Q value in line 1 of report_quality.txt corresponds to (from the same read) the L value from line 1 in the report_length.txt? in other words, can I assume both values belong to the same read? if not how can I get the corresponding Q and L values for the same read? (can we somehow keep the ID of each read in a separate column in the reports to see what value belongs to what read?

Thanks for your replay

esteinig commented 2 months ago

@Hedi65 Read lengths and qualities in the output files are those of the filtered reads, do not correspond to the input order of the reads and do not correspond with other due to unstable sorting - these options were meant for plotting filtered distributions where order does not matter.

I do think an ordered output where each line in qualities and lengths output files corresponds to the output read order is a better way to implement this. I have put it on the next release version and will include an optional column with read identifier - thanks for raising this!