DMU-lilab / pTrimmer

Used to trim off the primer sequence from mutiplex amplicon sequencing
GNU General Public License v3.0
21 stars 5 forks source link

Segmentation Fault #9

Closed donutbrew closed 4 years ago

donutbrew commented 4 years ago

Hi - after building pTrimmer, I can't seem to get it to work--it drops with a Segmentation Fault right away.

Here's my command, using fastq files generated by a MiSeq:

$ pTrimmer -l -t pair -a /data/primers/200220.txt -f Sample1_S1_L001_R1_001.fastq.gz -d S1.trim_R1.fastq -r Sample1_S1_L001_R2_001.fastq.gz -e S1.trim_R2.fastq -q 25
[*] Processing the [thread: 1] ...
[*] Processing the [thread: 2] ...
Segmentation fault (core dumped)

I've also tried with non-gzipped fastqs, but I get the same result. I'm sorry I can't provide too many details, but this is all I have. Let me know how I can help.

XLZH commented 4 years ago

Hi @donutbrew

I am sorry for the trouble caused by pTrimmer!

Could you please provide some fastq reads (and the corresponding primer file) to reproduce the problem? Thanks ~

Best wishes Xiaolong

donutbrew commented 4 years ago

Primer file: https://gist.github.com/donutbrew/41d1acce6876cf6cf15382f959d7f1ad

Truncated reads (I get the same result with the full-length files): https://gist.github.com/donutbrew/311f25df80fcf059146cf3b02610b591

I built it with gcc/9.2.0 if that turns out to be important...

XLZH commented 4 years ago

I find the read length of your fastq file is 301-bp, which result in the core dump!

The easiest way is to modify 'fastq.h/FQLINE' value from 256 to 512, and recompile the code.

However, after testing your fastq and primer file, pTrimmer gives a very low reads-trimming ratio (6.5%):

$ ./pTrimmer-1.3.3 -t pair -a CDC_SC2_200710.txt -f sample1_R1.fastq -d Trim_R1.fq -r sample1_R2.fastq -e Trim_R2.fq -q 25

[*] Processing the [thread: 1] ...
[*] Processing the [thread: 2] ...
Total time consume: 0.0(s)

----------------- Summary ------------------------
Total reads processed: 200
Reads have bad primer: 180
Reads have bad quality: 7
Reads successfully trimed and have good quality: 6.50 %

Then, I checked your fastq reads and the corresponding primers, and found most of your reads are not starting with primer sequence (like follows)!

@M04500:102:000000000-J6F5V:1:1101:19169:2004 1:N:0:1
    TACTACCACACAACTGATCCTAGTTTTCTGGGTAGGTACATGTCAGCATTAAATCACACTAAAAAGTGGAAATACCCA ... (your read)
TGGCTACTACCGAAGAGCTACC (primer sequence)

@M04500:102:000000000-J6F5V:1:1101:10091:1954 1:N:0:1
        GTAGTGGAAAATCCTACCATACAGAAAGACGTTCTTGAGTGTAACTGTCTCTTATACACATCTCCGAGCCCACGAG ... (your read)
CTGAAGAAGTAGTGGAAAATCCTACCA (primer sequence)

As we known, the reads we get from target/amplicon sequencing are always start from the first base of primer sequence (like follows):

@M03970:332:000000000-J2CK5:1:1101:16151:2887 1:N:0:15
GTCCAGCTTTGTGCCAGGAGCCTCGCAGGGGTTGATGGGATTGGGGTTTTCCCCTCCCATGTGCTCAAGACTGGCGCTAAAAGTTTTGAGCTTCTCAAAAGTCTAGA ... (read)
GTCCAGCTTTGTGCCAGGAG (primer sequence)

@M03970:332:000000000-J2CK5:1:1101:21721:3033 2:N:0:15
AGCCCGAACGCAAAGTGTCCCCGGAGCCCAGCAGCTACCTGCTCCCTGGACGGTGGCTCTAGACTTTTGAGAAGCTCAAAACTTTTAGCGCCAGTCTTGAGCACATG ... (read)
AGCCCGAACGCAAAGTGT (primer sequence)

Therefore, I think your reads have the following possibilities:

(1) your sequencing strategy is not target/amplicon sequencing
(2) part of the primer sequence are mis-trimmed at the begining of your read

I suggest you check your read and the corresponding primer to find why the start of your read is different from the start of your primer sequence.

XLZH commented 4 years ago

@donutbrew If you still have questions about the use of pTrimmer, please feel free to contact me.

donutbrew commented 4 years ago

@XLZH Thanks for the solution. I appreciate the good comments in the source--too bad I didn't read them!

And yeah, I passed you reads from amplicons that had been fragmented prior to library prep, so what you saw makes sense.