BioJulia / FASTX.jl

Parse and process FASTA and FASTQ formatted files of biological sequences.
https://biojulia.dev
MIT License
61 stars 20 forks source link

ERROR: Error when parsing FASTX file. Saw unexpected byte 'C' on line 1 #107

Closed ljournot closed 1 year ago

ljournot commented 1 year ago

I am using FASTX.jl for the first time. I try to read a FASTA file "Test_txt.txt", which contains

>Acc1
GATC
>Acc2
TGAC

and get the error message mentioned in the title. The problem is clearly in my file. However, it is readable by other softwares (BioEdit, APE...) able to manage FASTA files. I got the same error message with FASTA files downloaded from the NCBI and with test files saved as MS-DOS or Unicode files. I had a look at the throw_parser_error function in the FASTX.jl source file but it didn't help. I will appreciate if you could let me know what the "byte 'C' on line 1" might be and how to get rid of it? Best wishes, Laurent

julia> using FASTX

julia> validate_fasta(IOBuffer(">Acc1\nGATC\n>Acc2\nTGAC")) === nothing
true

julia> reader = FASTAReader(IOBuffer(">Acc1\nGATC\n>Acc2\nTGAC"))
FASTX.FASTA.Reader{TranscodingStreams.NoopStream{IOBuffer}}(TranscodingStreams.NoopStream{IOBuffer}(<mode=idle>), 1, 1, nothing, FASTX.FASTA.Record:
  description: ""
     sequence: "", true)

julia> collect(reader)
2-element Vector{FASTX.FASTA.Record}:
 FASTX.FASTA.Record:
  description: "Acc1"
     sequence: "GATC"
 FASTX.FASTA.Record:
  description: "Acc2"
     sequence: "TGAC"

julia> validate_fasta(IOBuffer("C:/Users/Laurent/Desktop/Test_txt.txt")) === nothing
false

julia> reader = FASTAReader(IOBuffer("C:/Users/Laurent/Desktop/Test_txt.txt"))
FASTX.FASTA.Reader{TranscodingStreams.NoopStream{IOBuffer}}(TranscodingStreams.NoopStream{IOBuffer}(<mode=idle>), 1, 1, nothing, FASTX.FASTA.Record:
  description: ""
     sequence: "", true)

julia> collect(reader)
ERROR: Error when parsing FASTX file. Saw unexpected byte 'C' on line 1
Stacktrace:
  [1] error(s::String)
    @ Base .\error.jl:35
  [2] throw_parser_error(data::Vector{UInt8}, p::Int64, line::Int64)
    @ FASTX C:\Users\Laurent\.julia\packages\FASTX\9Dngy\src\FASTX.jl:124
  [3] macro expansion
    @ C:\Users\Laurent\.julia\packages\FASTX\9Dngy\src\fasta\readrecord.jl:102 [inlined]
  [4] readrecord!(stream::TranscodingStreams.NoopStream{IOBuffer}, record::FASTX.FASTA.Record, state::Tuple{Int64, Int64})
    @ FASTX.FASTA C:\Users\Laurent\.julia\packages\Automa\5enCH\src\Stream.jl:124
  [5] _read!
    @ C:\Users\Laurent\.julia\packages\FASTX\9Dngy\src\fasta\reader.jl:104 [inlined]
  [6] iterate(rdr::FASTX.FASTA.Reader{TranscodingStreams.NoopStream{IOBuffer}}, state::Nothing)
    @ FASTX.FASTA C:\Users\Laurent\.julia\packages\FASTX\9Dngy\src\fasta\reader.jl:79
  [7] iterate
    @ C:\Users\Laurent\.julia\packages\FASTX\9Dngy\src\fasta\reader.jl:79 [inlined]
  [8] _collect(cont::UnitRange{Int64}, itr::FASTX.FASTA.Reader{TranscodingStreams.NoopStream{IOBuffer}}, #unused#::Base.HasEltype, isz::Base.SizeUnknown)
    @ Base .\array.jl:718
  [9] collect(itr::FASTX.FASTA.Reader{TranscodingStreams.NoopStream{IOBuffer}})
    @ Base .\array.jl:707
 [10] top-level scope
    @ REPL[14]:1

Your Environment

jakobnissen commented 1 year ago

The problem is that you are trying to not read a file, but the literal text of the path. Replace "IOBuffer" with "open".

ljournot commented 1 year ago

Dear Jakob, Thanks a lot for your prompt answer and sorry for the question, which I realize now was trivial. I am relatively new to programing. Best wishes, Laurent

jakobnissen commented 1 year ago

Happy to help. For any other questions, you're welcome to open an issue, but I also recommend checking out the Julia language slack or the Julia language Discourse, both which have biology-related channels.