Closed EorgeKit closed 7 months ago
Hi @EorgeKit ,
Thanks for your interest in Atria.
The workDir
contains space, so you need to quote every workDir
in script. -o $workDir/atria_trimming
needs to be -o "$workDir/atria_trimming"
.
THanks alot, I have fixed that but now I am getting a new error that I think it might have to do with my fastq files. Trouble is , I have a hard time pinpointing the exact source of the problem since the error is not fully descriptive as to where the problem originates. Also this is happening for two paired end fastq samples that were sequenced together but its not happening to another sample that was sequenced by macrogen, any advice?
code:
#!/usr/bin/env bash
#Data locations
workDir="/media/geokit/Extreme SSD/Eugene"
echo 'work directory is' $workDir
ls "$workDir"
##Data1
Read1_1='/media/geokit/Extreme SSD/Eugene/S1_S1_L001_R1_001.fastq.gz'
Read1_2='/media/geokit/Extreme SSD/Eugene/S1_S1_L001_R2_001.fastq.gz'
##Data2
Read2_1='/media/geokit/Extreme SSD/Eugene/S2_S1_L001_R1_001.fastq.gz'
Read2_2='/media/geokit/Extreme SSD/Eugene/S2_S1_L001_R2_001.fastq.gz'
##Data3
Read3_1="$workDir/D_1.fastq.gz"
Read3_2="$workDir/D_2.fastq.gz"
#Atria analysis
##Data3
atria \
-r "$Read2_1" \
-R "$Read2_2" \
-o "$workDir/atria_trimming" \
--detect-adapter --no-quality-trim \
-t 3 --no-tail-n-trim --max-n=-1 --length-range 100:500
error:
work directory is /media/geokit/Extreme SSD/Eugene
atria_trimming D_2.fastq.gz S1_S1_L001_R1_001.fastq S2_S1_L001_R1_001.atria.log
atria_trimming.sh prtn-g.fasta S1_S1_L001_R2_001.fastq S2_S1_L001_R1_001.fastq.gz
D_1.fastq.gz prtn-m.fasta S2_S1_L001_R1_001.atria.fastq.gz S2_S1_L001_R2_001.fastq.gz
pigz 2.6
TaskFailedException
') at index 321 to BioSequences.DNAAlphabet{4}(). Is the input file valid? Does the disk have bad sections? The error is found in the following context:
G
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] throw_encode_error(A::BioSequences.DNAAlphabet{4}, src::Vector{UInt8}, soff::UInt64)
@ BioSequences ~/.julia/packages/BioSequences/QcYXq/src/longsequences/copying.jl:211
[3] encode_chunk
@ ~/.julia/packages/BioSequences/QcYXq/src/longsequences/copying.jl:225 [inlined]
[4] copyto!(dst::BioSequences.LongSequence{BioSequences.DNAAlphabet{4}}, doff::Int64, src::Vector{UInt8}, soff::UInt64, N::UInt64, #unused#::BioSequences.AsciiAlphabet)
@ BioSequences ~/.julia/packages/BioSequences/QcYXq/src/longsequences/copying.jl:355
[5] copyto!
@ ~/.julia/packages/BioSequences/QcYXq/src/longsequences/copying.jl:292 [inlined]
[6] safe_copyto!
@ ~/atria/Atria/src/FqRecords/copy.jl:31 [inlined]
[7] #StringChunk2FqRecord!#40
@ ~/atria/Atria/src/FqRecords/thread_input.jl:916 [inlined]
[8] (::Atria.FqRecords.var"#27#31"{Int64, NTuple{30, Vector{Atria.FqRecords.FqRecord}}, Int64})()
@ Atria.FqRecords ./threadingconstructs.jl:258
You need to check one of the fastq file containing an empty line + a single G in the following line. The fastq is not valid
Eric
From: George Kitundu @.> Sent: Friday, March 1, 2024 4:57:24 AM To: cihga39871/Atria @.> Cc: Jiacheng Chuan @.>; State change @.> Subject: Re: [cihga39871/Atria] Too many arguments error (Issue #15)
THanks alot, I have fixed that but now I am getting a new error that I think it might have to do with my fastq files. Trouble is , I have a hard time pinpointing the exact source of the problem since the error is not fully descriptive as to where the problem originates. Also this is happening for two paired end fastq samples that were sequenced together but its not happening to another sample that was sequenced by macrogen, any advice?
code:
workDir="/media/geokit/Extreme SSD/Eugene" echo 'work directory is' $workDir ls "$workDir"
Read1_1='/media/geokit/Extreme SSD/Eugene/S1_S1_L001_R1_001.fastq.gz' Read1_2='/media/geokit/Extreme SSD/Eugene/S1_S1_L001_R2_001.fastq.gz'
Read2_1='/media/geokit/Extreme SSD/Eugene/S2_S1_L001_R1_001.fastq.gz' Read2_2='/media/geokit/Extreme SSD/Eugene/S2_S1_L001_R2_001.fastq.gz'
Read3_1="$workDir/D_1.fastq.gz" Read3_2="$workDir/D_2.fastq.gz"
atria \ -r "$Read2_1" \ -R "$Read2_2" \ -o "$workDir/atria_trimming" \ --detect-adapter --no-quality-trim \ -t 3 --no-tail-n-trim --max-n=-1 --length-range 100:500
error:
work directory is /media/geokit/Extreme SSD/Eugene atria_trimming D_2.fastq.gz S1_S1_L001_R1_001.fastq S2_S1_L001_R1_001.atria.log atria_trimming.sh prtn-g.fasta S1_S1_L001_R2_001.fastq S2_S1_L001_R1_001.fastq.gz D_1.fastq.gz prtn-m.fasta S2_S1_L001_R1_001.atria.fastq.gz S2_S1_L001_R2_001.fastq.gz pigz 2.6 TaskFailedException
') at index 321 to BioSequences.DNAAlphabet{4}(). Is the input file valid? Does the disk have bad sections? The error is found in the following context:
G
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] throw_encode_error(A::BioSequences.DNAAlphabet{4}, src::Vector{UInt8}, soff::UInt64)
@ BioSequences ~/.julia/packages/BioSequences/QcYXq/src/longsequences/copying.jl:211
[3] encode_chunk
@ ~/.julia/packages/BioSequences/QcYXq/src/longsequences/copying.jl:225 [inlined]
[4] copyto!(dst::BioSequences.LongSequence{BioSequences.DNAAlphabet{4}}, doff::Int64, src::Vector{UInt8}, soff::UInt64, N::UInt64, #unused#::BioSequences.AsciiAlphabet)
@ BioSequences ~/.julia/packages/BioSequences/QcYXq/src/longsequences/copying.jl:355
[5] copyto!
@ ~/.julia/packages/BioSequences/QcYXq/src/longsequences/copying.jl:292 [inlined]
[6] safe_copyto!
@ ~/atria/Atria/src/FqRecords/copy.jl:31 [inlined]
[7] #StringChunk2FqRecord!#40
@ ~/atria/Atria/src/FqRecords/thread_input.jl:916 [inlined]
[8] (::Atria.FqRecords.var"#27#31"{Int64, NTuple{30, Vector{Atria.FqRecords.FqRecord}}, Int64})()
@ Atria.FqRecords ./threadingconstructs.jl:258
— Reply to this email directly, view it on GitHubhttps://github.com/cihga39871/Atria/issues/15#issuecomment-1972779230, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AE4TY4DH2URHBLKTR5BY6VLYWA7HJAVCNFSM6AAAAABD53F5YKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZSG43TSMRTGA. You are receiving this because you modified the open/close state.Message ID: @.***>
Noted, I tried to extract about 10 first reads from the file. Upon inspection, there is no any empty line whatsoever but it still throws the same error. Here is the file: subset.txt
Can you run the following and show me the output?
Read2_1='/media/geokit/Extreme SSD/Eugene/S2_S1_L001_R1_001.fastq.gz'
Read2_2='/media/geokit/Extreme SSD/Eugene/S2_S1_L001_R2_001.fastq.gz'
zcat "$Read2_1" | grep -E -n "^G$"
zcat "$Read2_2" | grep -E -n "^G$"
No standard output
ls -lhtr S2_S1_L001_R*
-rwxr-xr-x 1 geokit geokit 25M Jan 14 2023 S2_S1_L001_R1_001.fastq.gz
-rwxr-xr-x 1 geokit geokit 25M Jan 14 2023 S2_S1_L001_R2_001.fastq.gz
zcat S2_S1_L001_R1_001.fastq.gz | grep -E -n "^G$
zcat S2_S1_L001_R2_001.fastq.gz | grep -E -n "^G$
It might be related to an unknown bug. Could you share the two gz files to me? Thanks.
Eric
From: George Kitundu @.> Sent: Friday, March 1, 2024 9:34:02 AM To: cihga39871/Atria @.> Cc: Jiacheng Chuan @.>; State change @.> Subject: Re: [cihga39871/Atria] Too many arguments error (Issue #15)
No standard output
ls -lhtr S2_S1_L001_R* -rwxr-xr-x 1 geokit geokit 25M Jan 14 2023 S2_S1_L001_R1_001.fastq.gz -rwxr-xr-x 1 geokit geokit 25M Jan 14 2023 S2_S1_L001_R2_001.fastq.gz
zcat S2_S1_L001_R1_001.fastq.gz | grep -E -n "^G$ zcat S2_S1_L001_R2_001.fastq.gz | grep -E -n "^G$
— Reply to this email directly, view it on GitHubhttps://github.com/cihga39871/Atria/issues/15#issuecomment-1973215064, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AE4TY4B67J6YLAE2PQBR2BTYWB7UVAVCNFSM6AAAAABD53F5YKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZTGIYTKMBWGQ. You are receiving this because you modified the open/close state.Message ID: @.***>
Absolutely. Here are the files: Data: S2_S1_L001_R1_001.fastq.gz
I found why. The line breaks in your files are \r\n
, but usually a fastq file's line break is \n
.
The error message was "Cannot encode byte 0x0d (char '\r') at index 321 to DNAAlphabet{4}. ", but the characters before \r
was truncated because \r
means 'move to the front of line' in Linux.
Can I know how you get those fastq.gz file? Are they from a sequencer directly, or someone process it before sending to you?
Currently, you can use zcat FASTQ | tr -d '\r' > NEW_FASTQ
to remove '\r' in files.
I found why. The line breaks in your files are
\r\n
, but usually a fastq file's line break is\n
.The error message was "Cannot encode byte 0x0d (char '\r') at index 321 to DNAAlphabet{4}. ", but the characters before
\r
was truncated because\r
means 'move to the front of line' in Linux.Can I know how you get those fastq.gz file? Are they from a sequencer directly, or someone process it before sending to you?
Well this is interesting haha, supposedly these are the ones that came after someone demultiplexed them, possibly the /r issue was introduced at that point
Currently, you can use
zcat FASTQ | tr -d '\r' > NEW_FASTQ
to remove '\r' in files.
NOted , let me get to it as soon as possible
So I implemented the suggestions and it finally worked. Thanks a lot for the assistance @cihga39871
Hello @cihga39871 I have been trying to use atria for the first time but everytime I try to run it I get the too many arguments error as follows: Running script:
Error: