Open smanfri opened 1 year ago
Hi thank you for identifying this discrepancy. Although I can't promise to fix this right now, it might be helpful to post here some values you are finding in FastQC vs what you are seeing in the supplementary. Thank you for your help.
Hi, thank you for the response. In the attached file, I compared the total reads reported in the supplementary table 2 and the value found by the tool FastQC (version 0.11.8). Note that:
Thank you for the attention, Sara Total-reads_Supplementary-table2_VS_FastQC.xlsx
Good morning,
I'm a student in Computer Science at Università degli Studi di Milano and for my thesis I am assessing some pipeline for the analysis of SARS-CoV-2 samples. In order to select the best pipeline for our requirements, I'm using the benchmark datasets available here. I found the Supplementary_table2 in your paper (Xiaoli L, Hagey JV, Park DJ, Gulvik CA, Young EL, Alikhan N-F, Lawsin A, Hassell N, Knipe K, Oakeson KF, Retchless AC, Shakya M, Lo C-C, Chain P, Page AJ, Metcalf BJ, Su M, Rowell J, Vidyaprakash E, Paden CR, Huang AD, Roellig D, Patel K, Winglee K, Weigand MR, Katz LS. 2022. Benchmark datasets for SARS-CoV-2 surveillance bioinformatics. PeerJ 10:e13821 http://doi.org/10.7717/peerj.13821) and I would like to use also the data contained there for evaluations (and not only the file in.tsv available for every dataset). I'm writing here because I can't understand how the column 'Total reads' is calculated. In particular, I used FastQC (the value of the field 'Total Sequences') to compute this value and I also counted the reads in the original .FASTQ file but the numbers don't correspond to the ones published in the Supplementary_table2.
Do you know why the numbers are different? Is it possible that Supplementary_table2 is outdated with respect to the current version of the dataset? If this is the case, which version of the dataset is matched to Supplementary_table2 and used in your paper?
Thank you very much for your time :)
Best regards, Sara Manfredi