Open tsackton opened 9 years ago
Hi, this is due to the SRA file header, the --illumina-trinity option called by Trinity was meant to be used with Illumina FastQ files with their typical header. In this case a quick work around would be to run fastool alone first on the R1 and R2 dataset with the options:
fastool --append /1 --to-fasta SRA_1.fastq > SRA_1_fixed.fastq
fastool --append /2 --to-fasta SRA_2.fastq > SRA_2_fixed.fastq
And then start Trinity with the "fixed" files, this should work.
very thanks for solving the problem which i am having
When processing SRA RNA-seq fastq files with Fastool as part of the Trinity package, Fastool appends a /H to the end of sequence ids which then causes errors downstream in Trinity.
Here are the first few lines of an SRA file: https://gist.github.com/tsackton/8c5508a4b60a1e33f6f2
When I run: fastool --to-fasta --illumina-trinity sra_test.fq > sra_test.1.fa , the output headers look like this:
If I remove everything after the first space in the sra example (with seqtk seq -C), the output is normal:
The /H files do not work with Trinity, while the normal files after seqtk seq -C processing do.
This is tested with the latest version of fastool, compiled on Centos 6 with gcc 4.8.2