BrendelGroup / SRAssembler

Selective and Recursive local Assembler
GNU General Public License v3.0
15 stars 6 forks source link

Request: Support gzipped fastq files #31

Open aboyher opened 4 years ago

aboyher commented 4 years ago

Requesting support for gzipped fastq files.

ShaolinXU commented 4 years ago

agree! It would be much easier to incorporate SRAssembler into workflow if it support gzipped fastq file

vpbrendel commented 4 years ago

Let me understand: Are you suggesting that within SRAssembler we run "gunzip" if the read input ends in *.gz? Instead of doing this outside the SRAssembler call in the workflow?

ShaolinXU commented 4 years ago

Could be in that way, But it's better to support gzipped fastq files without the unzip process. Does the indexing process of vmatch need fasta files? If this is the case, may be we can use some tools like seqkit split2 to split fastq directly into fasta file, that would be easier.

gwct commented 3 months ago

Hello, I'm just attempting to use SRAssembler and wanted to add that I also think support for compressed fastq files would be nice. My C++ is rusty, but I know in other languages its not necessary to explicitly run gunzip prior to running the program, but rather it is possible to check the compression as the file is read and then read the file conditionally with some library (maybe zlib for C++?).

But given that it works as is, I think just a note in the documentation saying that the fastq files need to be uncompressed would be helpful.


To be thorough, and in case anyone else tries to search for the error text, this is the output and error I see when I try to run on compressed fastq files using a libraries file (-l). This is likely also related to #33. This is also the same error I get when one of those files listed does not exist.

singularity run -e -B $(pwd) sra.sif -q sfGFP.fa -t dna -p SRAssembler/demo/SRAssembler.conf -o srasm-out-1 -l libraries.txt -r srasm-preprocess -A 1 -k 15:10:45 -s mouse
[2024-04-17 14:49:44] [INFO] SRAssembler v1.0.0 command: SRAssembler -q sfGFP.fa -t dna -p SRAssembler/demo/SRAssembler.conf -o srasm-out-1 -l libraries.txt -r srasm-preprocess -A 1 -k 15:10:45 -s mouse
[2024-04-17 14:49:44] [INFO] Total processors: 1
[2024-04-17 14:49:44] [INFO] We have 4 libraries
[2024-04-17 14:49:44] [INFO] library 1: B6_AON_S1_L001_R1_001.fastqlibrary
[2024-04-17 14:49:44] [INFO] insert size: 350
[2024-04-17 14:49:44] [INFO] left read: fastq/B6_AON_S1_L001_R1_001.fastq.gz
[2024-04-17 14:49:44] [INFO] right read: fastq/B6_AON_S1_L001_R2_001.fastq.gz
[2024-04-17 14:49:44] [INFO] reversed: 0
[2024-04-17 14:49:44] [INFO] Paired-end: 1
[2024-04-17 14:49:44] [INFO] library 2: B6_AON_S1_L002_R1_001.fastqlibrary
[2024-04-17 14:49:44] [INFO] insert size: 350
[2024-04-17 14:49:44] [INFO] left read: fastq/B6_AON_S1_L002_R1_001.fastq.gz
[2024-04-17 14:49:44] [INFO] right read: fastq/B6_AON_S1_L002_R2_001.fastq.gz
[2024-04-17 14:49:44] [INFO] reversed: 0
[2024-04-17 14:49:44] [INFO] Paired-end: 1
[2024-04-17 14:49:44] [INFO] library 3: B6_AON_S1_L003_R1_001.fastqlibrary
[2024-04-17 14:49:44] [INFO] insert size: 350
[2024-04-17 14:49:44] [INFO] left read: fastq/B6_AON_S1_L003_R1_001.fastq.gz
[2024-04-17 14:49:44] [INFO] right read: fastq/B6_AON_S1_L003_R2_001.fastq.gz
[2024-04-17 14:49:44] [INFO] reversed: 0
[2024-04-17 14:49:44] [INFO] Paired-end: 1
[2024-04-17 14:49:44] [INFO] library 4: B6_AON_S1_L004_R1_001.fastqlibrary
[2024-04-17 14:49:44] [INFO] insert size: 350
[2024-04-17 14:49:44] [INFO] left read: fastq/B6_AON_S1_L004_R1_001.fastq.gz
[2024-04-17 14:49:44] [INFO] right read: fastq/B6_AON_S1_L004_R2_001.fastq.gz
[2024-04-17 14:49:44] [INFO] reversed: 0
[2024-04-17 14:49:44] [INFO] Paired-end: 1
[2024-04-17 14:49:44][DOING] Now pre-processing the reads files ...
[2024-04-17 14:49:44][DOING] Splitting read library 1 ...
terminate called after throwing an instance of 'std::out_of_range'
  what():  basic_string::substr: __pos (which is 1) > this->size() (which is 0)
Command terminated by signal 6