WFSRDataScience / PIMENTA

PIMENTA for rapid identification of species using MinION sequencing
https://pimenta.readthedocs.io
1 stars 0 forks source link

How to only run `RunModules='Clustering,Consensus'`? #2

Open ocstringham opened 6 months ago

ocstringham commented 6 months ago

I'm trying to run PIMENTA from Clustering to Consenses (i.e. RunModules='Clustering,Consensus'). But, I can't figure out where in the settingsfile.txt file to point to my fastq files that I already ran quality filtered and length filtered. Where can I specify this? Thank you!

MycoMap commented 5 months ago

I am interested in the same, if possible.

valerievandervorst commented 5 months ago

Hi,

PIMENTA for now only recognizes file paths from fastq files generated by Guppy. What you can do is change line 58 in Run_DNA-metabarcoding-MinION_without_settings.sh to your folder with fastq files: Guppy_demultiplexed="${FAST5Folder}/Guppy_demultiplexed"

The fastq files inside that folder have to be split into folders per sample with a folder name starting with “barcode”. You can specify the sample names by including a sample description file.

Then you will have to remove lines 97 up to and including 102 (with Prinseq commands) in scripts/Pipeline_HPC.sh and change line 104 from: cat $MIDFolderName/$MID.$SampleName.QC.fastq | awk 'NR%4==1{printf ">%s\n", substr($0,2)}NR%4==2{print}' > $MIDFolderName/$MID.$SampleName.QC.fasta

To: cat $MIDFolderName/$MID.$SampleName.adapter_trim.fastq | awk 'NR%4==1{printf ">%s\n", substr($0,2)}NR%4==2{print}' > $MIDFolderName/$MID.$SampleName.QC.fasta

When running the pipeline include “QCtrimming” in RunModules. It won’t run the QC/length filtering, but it will put the fastq files in the places PIMENTA can recognize and translate fastq to fasta.

Let me know if this works for you and if you have more questions.