Inputting multiple long-read files at once

bcgsc / RNA-Bloom

:hibiscus: reference-free transcriptome assembly for short and long reads

Other

85 stars 7 forks source link

Inputting multiple long-read files at once #59

Open dvirdi01 opened 9 months ago

dvirdi01 commented 9 months ago

All the files I need to run this on are in a directory. Is there a way I can give the path to the directory in -long <path/to/directory> rather than listing out all files like -long <FILEA FILEB ....> ?

Also, is there a way to run bloom with snakemake?

kmnip commented 9 months ago

The input argument cannot be a directory. If you have too many read files, then you can aggregate all the read file paths one on each line within a text file. You can specify the path to this text file with the @ prefix, e.g.

rnabloom -long @/path/to/list_file.txt ...

Example content of list_file.txt:

/path/to/read_file_01.fastq.gz
/path/to/read_file_02.fastq.gz
/path/to/read_file_03.fastq.gz
/path/to/read_file_04.fastq.gz
/path/to/read_file_05.fastq.gz

You can run RNA-Bloom in a single command; you don't need snakemake.

kmnip commented 9 months ago

If RNA-Bloom is a step in your Snakemake workflow, then you can run RNA-Bloom as a shell command within a rule. FYI: https://snakemake.readthedocs.io/en/v3.12.0/snakefiles/rules.html

dvirdi01 commented 8 months ago

I ran rnabloom on each input file separately and it produced the transcripts for each of them. However, when I give it all the input files at once to make a combined transcriptome, it gives me the following error:

Exception in thread "Thread-837" java.lang.OutOfMemoryError: Java heap space Line 3 of FASTQ record is expected to start with '+' rnabloom.io.FileFormatException: Line 3 of FASTQ record is expected to start with '+'

This is the command I ran: rnabloom -long sample1.fastq sample2.fastq sample3.fastq -t 48 -outdir /.../assembly