claraqin / neonMicrobe

Processing NEON soil microbe marker gene sequence data into ASV tables.
GNU Lesser General Public License v3.0
9 stars 4 forks source link

Make remove_unmatched_files() robust to differences in file naming convention #23

Closed claraqin closed 3 years ago

claraqin commented 4 years ago

The remove_unmatched_files() function discards from the 16S processing pipeline any R1 fastq files that do not appear to have a matching R2 fastq file, and vise versa. It currently performs this matching using the filenames only. This causes problems when there are slight mismatches in the file naming conventions between R1 and R2 fastq files of the same sequencing run, as for run BFDG8:

> head(fnFs)
[1] "/hb/home/claraqin/projects/NEON_soil_microbe_processing/NEON/raw_sequence/16S/0_raw/runBFDG8_BMI_Plate13WellA12_16S_R1.fastq"
[2] "/hb/home/claraqin/projects/NEON_soil_microbe_processing/NEON/raw_sequence/16S/0_raw/runBFDG8_BMI_Plate13WellA3_16S_R1.fastq" 
[3] "/hb/home/claraqin/projects/NEON_soil_microbe_processing/NEON/raw_sequence/16S/0_raw/runBFDG8_BMI_Plate13WellA4_16S_R1.fastq" 
[4] "/hb/home/claraqin/projects/NEON_soil_microbe_processing/NEON/raw_sequence/16S/0_raw/runBFDG8_BMI_Plate13WellA5_16S_R1.fastq" 
[5] "/hb/home/claraqin/projects/NEON_soil_microbe_processing/NEON/raw_sequence/16S/0_raw/runBFDG8_BMI_Plate13WellA9_16S_R1.fastq" 
[6] "/hb/home/claraqin/projects/NEON_soil_microbe_processing/NEON/raw_sequence/16S/0_raw/runBFDG8_BMI_Plate13WellB1_16S_R1.fastq" 
> length(fnFs)
[1] 142
> head(fnRs)
[1] "/hb/home/claraqin/projects/NEON_soil_microbe_processing/NEON/raw_sequence/16S/0_raw/runBFDG8_BMI_Plate13WellA10_16S_BFDG8_R2.fastq"
[2] "/hb/home/claraqin/projects/NEON_soil_microbe_processing/NEON/raw_sequence/16S/0_raw/runBFDG8_BMI_Plate13WellA11_16S_BFDG8_R2.fastq"
[3] "/hb/home/claraqin/projects/NEON_soil_microbe_processing/NEON/raw_sequence/16S/0_raw/runBFDG8_BMI_Plate13WellA12_16S_BFDG8_R2.fastq"
[4] "/hb/home/claraqin/projects/NEON_soil_microbe_processing/NEON/raw_sequence/16S/0_raw/runBFDG8_BMI_Plate13WellA2_16S_BFDG8_R2.fastq" 
[5] "/hb/home/claraqin/projects/NEON_soil_microbe_processing/NEON/raw_sequence/16S/0_raw/runBFDG8_BMI_Plate13WellA3_16S_BFDG8_R2.fastq" 
[6] "/hb/home/claraqin/projects/NEON_soil_microbe_processing/NEON/raw_sequence/16S/0_raw/runBFDG8_BMI_Plate13WellA4_16S_BFDG8_R2.fastq" 
> length(fnRs)
[1] 188
> matched_fn <- remove_unmatched_files(fnFs, fnRs)
> matched_fn
$R1
character(0)

$R2
character(0)
claraqin commented 3 years ago

This is now being handled by the meta_reliant_utils branch