As @rcedgar put it today, we need both real reads along with mock reads for validation, as there are strengths and weaknesses with either approach. @ababaian mentioned that he has human cell line samples with coronavirus DNA that can be used for developing some of the "read reads" based validation datasets. @ababaian, can you flesh out what you were describing, as I'm sure I didn't get it 100%.
Create a 'benchmark' set of RNA-seq data of human (use real data from tissue) and a dilution series of 'spike in' real SARS-CoV-2 sequences at varyingdilution: 1e6, 1e5, 1e4 ... 1 viral genome copy per library.
Implement a naming convention/flags in the reads such that we can rapidly separate and quantify human/CoV sequences from a BAM/SAM file using 'grep'
Must be delivered with a reproducible script to create new bench-mark datasets rapidly with minimal additional software requirements requirements.
As @rcedgar put it today, we need both real reads along with mock reads for validation, as there are strengths and weaknesses with either approach. @ababaian mentioned that he has human cell line samples with coronavirus DNA that can be used for developing some of the "read reads" based validation datasets. @ababaian, can you flesh out what you were describing, as I'm sure I didn't get it 100%.