vibrio case study - Githubissues

tavareshugo commented 9 months ago

ERR1485225
ERR1485227
ERR1485229
ERR1485231
ERR1485233
ERR1485235
ERR1485237
ERR1485301
ERR1485299

The last two are water samples. Here is the run selector link, if useful.

Also, here is a V. parahaemolyticus sample:

SRR26899141

We will downsample one of the above samples as well so the quality of their assembly is affected.

tavareshugo commented 9 months ago

Renamed the samples:

ERR1485225 --> isolate01
ERR1485227 --> isolate02
ERR1485229 --> isolate03
ERR1485231 --> isolate04
ERR1485233 --> isolate05
ERR1485235 --> isolate06
ERR1485237 --> isolate07
SRR26899141 --> isolate08
ERR1485299 --> isolate09
ERR1485301 --> isolate10

tavareshugo commented 9 months ago

Because SRR26899141 was different read length and larger library size we did:

# trim to 100bp and ~1.4M reads to match other libraries
seqtk trimfq -q 0 -l 100 data/reads/SRR26899141_SRR26899141_1.fastq.gz | seqtk sample -s 1 - 1304923 | gzip > data/reads/isolate08_1.fastq.gz
seqtk trimfq -q 0 -l 100 data/reads/SRR26899141_SRR26899141_2.fastq.gz | seqtk sample -s 1 - 1304923 | gzip > data/reads/isolate08_2.fastq.gz

Also, we downsampled one of the samples to give something else to discuss in the QC step:

# downsample to generate a lower coverage sample
seqtk sample -s 1 data/reads/ERR1485227_ERR1485227_1.fastq.gz 337259 | gzip > data/reads/isolate02_1.fastq.gz
seqtk sample -s 1 data/reads/ERR1485227_ERR1485227_2.fastq.gz 337259 | gzip > data/reads/isolate02_2.fastq.gz

tavareshugo commented 9 months ago

This impromptu exercise received positive feedback on our survey, so we should incorporate it in the course.

tavareshugo commented 2 months ago

This is now included in the Dropbox tar file.

For reference, I've also kept the script used to create the data.

cambiotraining / bacterial-genomics

vibrio case study #25