artic-network / fieldbioinformatics

The ARTIC field bioinformatics pipeline
MIT License
112 stars 68 forks source link

Test Datasets #80

Open kek12e opened 3 years ago

kek12e commented 3 years ago

Hello,

I was wondering if you knew of any good publicly available datasets for the V3 Artic Tiling Amplicon sequencing of hCoV-19. Ideally I would love to have a test dataset showing each of the variants of concern (UK, South Africa, and Brazil) along with the Wuhaun strain.

I've tried looking through GISAID and SRA, but as far as I can tell GISAID only supplies the preassembled genomes in a strange format and I need the raw fast5 or fastq files. And SRA is just very challenging to search to get exactly the type of library/sequence set you need that has enough metadata to inform the analysis.

I apologize if there are datasets somewhere already, but I'm somewhat frantically trying to figure out how to do this ARTIC analysis before teaching a course on it that begins on Monday. We sequenced synthetic salvia that contained RNA for the Wuhaun strain and three variants of concern, but for some reason the variant calling is coming up with nothing and I'm struggling to figure out if it's our data or if it's something going wrong in the pipeline.

I would sincerely appreciate any help or a pointer to appropriate public datasets that have done the ARTIC V3 tiling amplicon approach and have worked with the standard SOP bioinformatics pipeline.

nickloman commented 3 years ago

We could certainly post something up like this - I think it would be a nice resource - would you want FAST5 (i.e. for nanopolish) or FASTQ?

kek12e commented 3 years ago

Is it greedy to ask for both? But fast5 if I have to choose. Thank you so much!