CDCgov / datasets-sars-cov-2

Benchmark datasets for WGS analysis of SARS-CoV-2. (https://peerj.com/articles/13821/)
Apache License 2.0
54 stars 18 forks source link

Consensus query #2

Closed iqbal-lab closed 2 years ago

iqbal-lab commented 2 years ago

Apologies if this is documented somewhere and I missed it. I just wanted to know what the process was for producing and qc-ibg or curating the truth consensuses

Thanks so much for collecting these, is a huge contribution

jvhagey commented 2 years ago

Hi @iqbal-lab, thanks for showing interested in the dataset. The full documentation will be in the forth coming manuscript, but I wrote a methods.md that hopefully will answer some of your questions. We will add to the methods.md over time as we get more questions/comments about the dataset so folks have one place to go to get questions answered. Let us know if this answers your question(s).

iqbal-lab commented 2 years ago

Thank you! It does answer my question. These are super valuable datasets and I can see myself and others using them to test their pipelines. It would be good to have manually curated truth assemblies/consensuses, which can then be used to evaluate the results of any new pipeline. This is horribly expensive/painful to do, but v valuable afterwards. If/when we start to use these data, we will inevitably have to do this, so we could submit it back if it were something you would value/want.

lskatz commented 2 years ago

Has this issue been resolved? Either @jvhagey or @iqbal-lab ?

lskatz commented 2 years ago

Closing it out due to conversation offline with @jvhagey