Open Erlor opened 4 years ago
Can you paste the output of od -c < your.fastq | head -n 100
(or zcat your.fastq.gz | oc -c | head -n 100
in case you FASTQ file is compressed)?
The output was quite lengthy. I also tried if it can be reproduced in another way, which it can. It can be achieved by running any fastq file through unix2dos (Basically introducing the \r to each line ending).
0000000 @ S R R 8 5 6 1 4 1 3 . 1 1 /
0000020 1 \r \n A T G G C G G C G G C G G
0000040 G C C T G G C G G A A C T G C T
0000060 G G G C G G A A G C C C G A C G
0000100 C A G G T G T G C A T C G C G G
0000120 C T G A A A T C G G C A T G G A
0000140 A C A T A A C C T T G G T T A A
0000160 C C T \r \n + \r \n C C D A C B 6 ;
0000200 ; 7 ; ; ; 1 ; 6 ; ; 6 ; ; 6 ; 6
0000220 ; 2 : : 2 2 2 * 2 2 . 2 . 2 ; C
0000240 9 ; ; ; ; 0 0 / + . . - - 4 4 B
0000260 C C C C B ? > > C D > C C C A C
0000300 D F F > > ; : : : 8 2 8 2 8 2 1
0000320 ) 1 ) 1 3 3 7 . \r \n @ S R R 8 5
0000340 6 1 4 1 3 . 2 2 / 1 \r \n G A C
0000360 T G A A G C A G G G C A G C T C
0000400 T A C T T T G A G C G G T G C A
0000420 G G C G T A T T G T G G A T G A
0000440 A G C G A G G C T G G C A C A T
0000460 G A A C A G T T A C C G G G G T
0000500 G A T A G T G A T A G A C G A C
0000520 C T C G G G T A T G T C A A A C
0000540 G C G A C A A C G C G G A A A C
0000560 G G G G T G T G T T G T T C G A
0000600 G G C G T G A T C G C A C A T C
0000620 G T T A T G A G C G C G G C A G
0000640 T C T G G T C G A T C A C C A G
0000660 C A A T C A G T C C G T T C A G
0000700 C A C G T G G G G G T A G C A T
0000720 C T T C G T G G A T G A A A C G
0000740 G A T G G C G G T G G C A G C A
0000760 G C C G A T C G T C T G A T C C
0001000 A G T A C G G G G T A T A T G T
0001020 T C G A G C T A A A A A G G A G
0001040 A A A G T T A C C G G A A A A A
0001060 G A C T G G C A A G G C A G T A
0001100 A C C A G C G T G G \r \n + \r \n >
0001120 @ @ D C C = @ @ ; ; ; 1 5 4 ; ;
0001140 ; ; @ ; ; : 9 * : B B ; ; 7 ; ;
0001160 < / / * - . . ; ; 8 ; ; ; 6 ; @
0001200 @ A > C > ? ; A = @ : ; 7 ; ; 6
0001220 6 ; ; ; > C ; 6 ; 7 ; D 0 ; 5 ;
0001240 , 5 4 C C C D C C C C ? ; ; A @
0001260 ; ; 6 ; : : : / 9 B @ : 2 2 2 :
0001300 / 9 A : : : : : < : : : B 4 : 9
0001320 / 2 : : : * 2 : 9 : @ > : 8 5 /
0001340 / - / ' - - - - - 3 9 9 9 ? : :
0001360 : : 3 : 1 . / / - 2 7 7 < ? 9 >
0001400 ? ? ? @ @ @ ? ; : : : : : : : 4
0001420 9 ? 8 8 2 : @ > ? ? B ; ? ? = A
0001440 A @ A : 8 : : 8 8 8 8 * 8 8 8 8
0001460 7 : = C 3 3 : 1 1 2 - - - - 4 -
0001500 - - ' - 8 7 2 8 B < > ? ; ? ; ;
0001520 ; ; ; C ? C 9 9 0 0 0 > 8 < 7 0
0001540 0 * 0 / / / / / / / ( 8 7 1 1 1
0001560 1 1 ) 1 1 1 1 1 1 1 3 3 3 0 ; ;
0001600 @ @ @ D 1 : : = B B = A = A C C
0001620 C 3 : 0 0 0 0 , / 0 5 : * 0 ; :
0001640 < = 7 < 9 ? ? < 7 7 7 ) \r \n @ S
0001660 R R 8 5 6 1 4 1 3 . 3 3 / 1 \r
0001700 \n T C C C T T C A T A C T G C A
0001720 C G T A G A G C T G C C G C A G
0001740 T T C A T C G G C A T A A G C C
0001760 T G C A G G A A T T C C G G G G
0002000 T A A A G A C G T C A C G A C G
0002020 G T G C T G C A A C C A G C G C
0002040 T G C T G C T C G G C T T C G T
0002060 C C A G C G T G C C G G G G A A
0002100 A T T A C G C G C C C G A T A G
0002120 T T G A A C A G C A G T T T C T
0002140 C G A T G C G T T T A T C G G C
0002160 A A A C G T G A G G T C C A G C
0002200 G C G G G C A G A T T G C G T G
0002220 C T C G G T C T C C A G C A G G
0002240 A T T T T C A T C G C C G C G C
0002260 G G T C C G C A T C G C T G A A
0002300 G A A A C C A T T G T A G A G C
0002320 T G G G T G T C G A C G T T T T
0002340 C C G A C G G C A C A A A A G G
0002360 C T C C G C C T C G G C A A A A
0002400 G T C G C C A C G A C T T G C G
0002420 C G T A C C G T G C G G T A T C
0002440 G C G C A \r \n + \r \n 0 6 6 , 6 0
0002460 6 ; ; ; ; > D C B C C B ; ; ; :
0002500 : : : : 4 : : : : : 4 : @ = = ?
0002520 @ ? C D G A D C D @ > < A A 7 ;
0002540 7 < 7 < 7 ; ; ; / ; B B ; B B ;
0002560 > ? ? ? > C D B @ ; ; ; ; ; > D
0002600 D C F A D D C C C @ A @ > < A A
0002620 ; ; 7 ; @ = ; ; > B ? C C C C C
0002640 C E @ B B B . : : / : 5 : : : =
0002660 : : : / = B @ @ @ C 5 : : 5 : :
0002700 : : : : B B 7 B B < 8 : : @ @ :
0002720 : : / : 8 8 8 2 8 C C > C C C B
0002740 @ @ 7 ; ; 7 ; ; ; ; @ @ @ : ? >
0002760 > > D A C C C D C C C C C @ C C
0003000 B C < @ : : : : 5 : 8 8 8 * 8 =
0003020 < < < B 5 ; ; ; : 8 = 8 8 4 ; :
0003040 : B B > ? ? ? ? ; ? 9 9 3 9 3 7
0003060 7 * 0 0 0 0 0 0 8 8 : : - 0 : :
My best guess is that one of the following lines should match the other as they seem to be supposed do the same job but differ:
Right now I'm not in the state of mind to dig deeper but it looks like the latter is the older version and the former was changed fixing https://github.com/OpenGene/fastp/issues/133 in https://github.com/OpenGene/fastp/commit/e01e9402c3d5afded49b21c8303be51d7cbb2d27.
Maybe this gives @sfchen an idea what's happening or maybe it helps someone else that has time to tackle this.
Hi, I have the similar error.
But my fastq file is the windows format, which has endings of \r\n I have removed any mBuf[end-1]=='\r' or mBuf[end]=='\r' in getLine() and it works well
Hello, So I have been running fastp and it has worked previously, but recently I ran a sample where roughly 100K reads out of 1M reads were trimmed and then fastp stops with the message:
I did dig a bit in the fastqreader code and it seems that there's some offset by 1 here so that the separator line now is empty and the separator instead occupies the line for quality.
I did find that the file has ^M$ at the end of each line indicating the file has been saved on a windows machine. By looking at the getline function in the fastqreader it seems you tried to deal with this, but it seems the problem has persisted.