Closed kmhandley closed 3 months ago
PS a subset of a pair of Waiwera fastq files from the brackish zone could work well.
Waiwera sequences did not have any/appreciable levels of human reads. (yay!) Trying a different approach: Simulating HiSeq of human genome and adding it to existing mock metagenomes as a separate read set.
Simulated "contamination" worked, the new example consists of one paired-end library named human_microb_reads.R{1,2}.fastq.gz
with about 9k reads per file from a human genome.
I ran this, but when I compared the reads before and after filtering they have the same line count - i.e., no reads were filtered out.
Did we have an animal to filter out in the first place? If not, the exercise doesn't make a lot of sense.
I strongly recommend using the script on only a single test fastq pair WITH host/euk, and adding a wc check line.
for i in {1..4}; do wc -l sample${i}_R1.fastq; wc -l sample${i}_R1_hostFilt.fastq; done 2199968 sample1_R1.fastq 2199968 sample1_R1_hostFilt.fastq 2199964 sample2_R1.fastq 2199964 sample2_R1_hostFilt.fastq 2199968 sample3_R1.fastq 2199968 sample3_R1_hostFilt.fastq 2199988 sample4_R1.fastq 2199988 sample4_R1_hostFilt.fastq