Closed nbat64 closed 1 year ago
Hello, There is a real data example here : http://htmlpreview.github.io/?https://github.com/AntonelliLab/seqcap_processor/blob/master/docs/documentation/tutorial.html to look how the data should be set up. I would bet that you have to unzip your fastq before and don't forget to rename xx_R1.fastq or xx_R2.fastq for paired end data.
Regards,
Mathias
Hello @mlaize yes, but the tutorial is for the previous version of the pipeline, when cleaning was made with Trimmomatic, not fastp. So the sample_annotation_file is different. I tried like this: name-something,name-something_R1.fastq.gz name-something,name-something_R2.fastq.gz
The clean_reads.py script start with the message:
Genus-species-NB01-107: Counting all reads (forward + reverse) belonging to this sample...
4968081
##################################################
Processing Genus-species-NB01-107...
But it gets stuck at this step, fastp seems to not produce any outputs, and there is no error message. I have run fastp outside the clean_reads, but it seems it cause then problem for assemble_reads with spades as it look for stats made by clean_reads.py I think?
Thanks, regards
@nbat64, which version of SECAPR are you running (what does secapr -v
give you as output)? You are correct that with the latest version that implements fastp for cleaning and trimming it is not necessary anymore to unzip the fastq-files. I haven't had time in several months to update the pipeline, so it is possible that there are some bugs. The last time I ran it, my adapter.txt
file looked like this:
[adapters]
i7:GATCGGAAGAGCACACGTCTGAACTCCAGTCAC*ATCTCGTATGCCGTCTTCTGCTTG
i5:AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
[names]
T_pyra1:_
T_pyra3:_
T_pella5:_
T_pella9:_
[barcodes]
i7-T_pyra1:ATTGAGGA
i7-T_pyra3:ATGCCTAA
i7-T_pella5:GTCTGTCA
i7-T_pella9:ACATTGGC
My fastq-samples were in one folder and were named like this:
T_pella5_R1.fastq
T_pella5_R2.fastq
T_pella9_R1.fastq
T_pella9_R2.fastq
T_pyra1_R1.fastq
T_pyra1_R2.fastq
T_pyra3_R1.fastq
T_pyra3_R2.fastq
I ran this command to clean:
secapr clean_reads --input pipeline_exercise/fastq_raw/ --config pipeline_exercise/adapter_info.txt --output pipeline_exercise/cleaned_trimmed_reads --index single
Let me know if that helps, I'll be more responsive now that I have some more time to work on SECAPR.
although I realize now that those were unzipped fastq files, but it should also work for zipped ones in theory (let me know if it doesn't )
Ignore the things I wrote above, that was for the old version. The latest development version takes as input under the --sample_annotation_file
flag in secapr clean_reads
a text file that looks like this:
T_pella5,RAPiD-Genomics_F226_GOT_130407_P001_WA01_i5-539_i7-59_S1986
T_pella9,RAPiD-Genomics_F226_GOT_130407_P001_WA02_i5-539_i7-27_S1987
T_pyra1,RAPiD-Genomics_F226_GOT_130407_P001_WA03_i5-539_i7-82_S1988
T_pyra3,RAPiD-Genomics_F226_GOT_130407_P001_WA04_i5-539_i7-7_S1989
The term before the comma is the name you want to assign to the given sample for all downstream operations. The term after the comma should be a string in the filename of the raw fastq file (zipped or unzipped) that uniquely identifies the respective sample.
For the --input
flag i provided a folder with the zipped fastq files, with the filenames looking like this:
RAPiD-Genomics_F226_GOT_130407_P001_WA01_i5-539_i7-59_S1986_L001_R1_001.fastq.gz
RAPiD-Genomics_F226_GOT_130407_P001_WA01_i5-539_i7-59_S1986_L001_R2_001.fastq.gz
RAPiD-Genomics_F226_GOT_130407_P001_WA02_i5-539_i7-27_S1987_L001_R1_001.fastq.gz
RAPiD-Genomics_F226_GOT_130407_P001_WA02_i5-539_i7-27_S1987_L001_R2_001.fastq.gz
RAPiD-Genomics_F226_GOT_130407_P001_WA03_i5-539_i7-82_S1988_L001_R1_001.fastq.gz
RAPiD-Genomics_F226_GOT_130407_P001_WA03_i5-539_i7-82_S1988_L001_R2_001.fastq.gz
RAPiD-Genomics_F226_GOT_130407_P001_WA04_i5-539_i7-7_S1989_L001_R1_001.fastq.gz
RAPiD-Genomics_F226_GOT_130407_P001_WA04_i5-539_i7-7_S1989_L001_R2_001.fastq.gz
Let me know in case that doesn't work for you or in case you have any other questions.
Hello,
I am testing secapr pipeline. However, I have an issue at the beginning with the reads cleaning with fastp. My job end without outputs and only as error message, the argument for
--read_min
flag.I have installed the pipeline in a mamba env, and my input are raw reads in fastq.gz
secapr clean_reads --input $folder/raw/fastq/ --output $folder/cleaned/ --sample_annotation_file test.txt
Do you have an example for the sample_annotation_file?
I thank you in advance for the help
Regards