Closed tavareshugo closed 2 months ago
Renamed the samples:
ERR1485225 --> isolate01
ERR1485227 --> isolate02
ERR1485229 --> isolate03
ERR1485231 --> isolate04
ERR1485233 --> isolate05
ERR1485235 --> isolate06
ERR1485237 --> isolate07
SRR26899141 --> isolate08
ERR1485299 --> isolate09
ERR1485301 --> isolate10
Because SRR26899141 was different read length and larger library size we did:
# trim to 100bp and ~1.4M reads to match other libraries
seqtk trimfq -q 0 -l 100 data/reads/SRR26899141_SRR26899141_1.fastq.gz | seqtk sample -s 1 - 1304923 | gzip > data/reads/isolate08_1.fastq.gz
seqtk trimfq -q 0 -l 100 data/reads/SRR26899141_SRR26899141_2.fastq.gz | seqtk sample -s 1 - 1304923 | gzip > data/reads/isolate08_2.fastq.gz
Also, we downsampled one of the samples to give something else to discuss in the QC step:
# downsample to generate a lower coverage sample
seqtk sample -s 1 data/reads/ERR1485227_ERR1485227_1.fastq.gz 337259 | gzip > data/reads/isolate02_1.fastq.gz
seqtk sample -s 1 data/reads/ERR1485227_ERR1485227_2.fastq.gz 337259 | gzip > data/reads/isolate02_2.fastq.gz
This impromptu exercise received positive feedback on our survey, so we should incorporate it in the course.
This is now included in the Dropbox tar file.
For reference, I've also kept the script used to create the data.
Some SRA numbers from this publication:
The last two are water samples. Here is the run selector link, if useful.
Also, here is a V. parahaemolyticus sample:
We will downsample one of the above samples as well so the quality of their assembly is affected.