Use human cell line samples with coronavirus DNA for developing benchmarking datasets

taltman commented 4 years ago

As @rcedgar put it today, we need both real reads along with mock reads for validation, as there are strengths and weaknesses with either approach. @ababaian mentioned that he has human cell line samples with coronavirus DNA that can be used for developing some of the "read reads" based validation datasets. @ababaian, can you flesh out what you were describing, as I'm sure I didn't get it 100%.

ababaian commented 4 years ago

From way back at #17

Objective

Create a 'benchmark' set of RNA-seq data of human (use real data from tissue) and a dilution series of 'spike in' real SARS-CoV-2 sequences at varyingdilution: 1e6, 1e5, 1e4 ... 1 viral genome copy per library.
Implement a naming convention/flags in the reads such that we can rapidly separate and quantify human/CoV sequences from a BAM/SAM file using 'grep'
Must be delivered with a reproducible script to create new bench-mark datasets rapidly with minimal additional software requirements requirements.

ababaian commented 4 years ago

Not really a thing anymore, we might have to swing back to this for the paper and can re-open then.

ababaian / serratus

Use human cell line samples with coronavirus DNA for developing benchmarking datasets #95

Objective