faircloth-lab / phyluce

software for UCE (and general) phylogenomics
http://phyluce.readthedocs.org/
Other
76 stars 48 forks source link

Combined/merged sequence runs - best point to merge? #305

Open eeeaston opened 1 year ago

eeeaston commented 1 year ago

I have multiple sequence runs for the same sample but different library preps. I usually cat them prior to trimming, but when I ran the script with the illumiprocessor.conf file, it errors, so more than two tags and duplicate tag maps are not supported. At what step in the workflow would you suggest merging the outputs?

brantfaircloth commented 1 year ago

I would trim them separately, then cat together the correct resulting files before assembly (be sure to create directory format and filenames similar to what you would normally get from illumiprocessor).

eeeaston commented 1 year ago

Thanks. Would you confirm which folders and files will be accessed by the package in downstream processes so that I know which need to be retained? Basically, could I just cat and place the concatenated files in NewParentDirectory/clean-fastq/sampleDIR/split-adapter-quality-trimmed with file names of NEWNAME-READ1-fastq.gz, NEWNAME-READ2-fastq.gz, NEWNAME-READ-singleton.gz without the stats and raw-reads directories and adapters.fasta file?

brantfaircloth commented 1 year ago

Yep, that's pretty much it! You want the directory to look like:

uce-tutorial
└──  clean-fastq
   └──  alligator_mississippiensis
      └──  split-adapter-quality-trimmed
         ├── alligator_mississippiensis-READ1.fastq.gz
         ├── alligator_mississippiensis-READ2.fastq.gz
         └── alligator_mississippiensis-READ-singleton.fastq.gz