Closed schorlton closed 1 year ago
Hi @schorlton,
Are you seeing different FASTA header formats in the final output (i.e. rnabloom.transcripts.fa
) of different assemblies?
Or, you mean different output FASTA files from the same assembly have different FASTA header formats?
If it is the latter, then it is actually intentional.
Ka Ming
Are you seeing different FASTA header formats in the final output (i.e.
rnabloom.transcripts.fa
) of different assemblies?
Yes this. Different reads used as input leads to differently formatted FASTA headers. Sorry that wasn't clear. I like the
>3 l=228 c=1.1 s=8
header format as I use the coverage and length information. However, not all transcripts have this information in the header, eg. if you run RNA-Bloom on the example read above, you'll only get a FASTA header with a sequence identifier, no coverage or length information.
Ah, ok. The reason why you see this header style in some but not others is because some assemblies may have ended at an earlier stage.
To resolve this issue, I will try to standardize the final output FASTA regardless of the assembly endpoint.
Please report
java -jar RNA-Bloom.jar -version
java -version
Trying to run RNA-Bloom indiscriminately on input files to see if they assemble. I don't check the files before as I want to leave it to RNA-Bloom to decide if it can assemble anything. Interestingly, RNA-Bloom produces different header formats in FASTA for different outputs.
Sometimes I get:
>3 l=228 c=1.1 s=8
other times I get:>s1
Note that these are with different inputs. Is it possible to output the same header format each time? In the latter format, does coverage=1?
Thanks!!
RNA-Bloom v2.0.0
Command:
Sample input read to reproduce single-element header: