ablab / rnaquast

Quality assessment of de novo transcriptome assemblies from RNA-Seq data
http://cab.spbu.ru/software/rnaquast
Other
19 stars 6 forks source link

Alignment of the reads #17

Open asan-emirsaleh opened 1 year ago

asan-emirsaleh commented 1 year ago

Hello! I used Quast for times and know I am trying to use the rnaQUAST. Both tools are mentioned as great and robust quality assessment techniques. Some thing are not clear enough to me to use rnaQUAST effectively. As of ordinary alignment procedures takes days until complete, it would be a good idea first to prepare alignment files before the pipeline started and pass them as the input. Are these right:

The one thing is also not clear for me. What the BLAST aligner is used for? And what is the reason of building the blast databases? Are there an option to pass the prebuild one?

Best regards Asan

andrewprzh commented 1 year ago

Dear @asan-emirsaleh

Sorry for such a long response, rnaQUAST is now only occasionally maintained as some of authors left the lab.

rnaQUAST uses gmap to map contigs to the genome, and BLAST to map contigs to the transcriptome. This way it allows to accurately detect misassemblies, e.g. chimeric contigs reported by both ways. Unfortunately, there is no option to pass existing blast database since rnaQUAST creates transcriptome FASTA by itself based on the annotation. You can also send me command line / log file for check.

I also suggest not to obtain database coverage by reads as it was implemented in a quite inefficient way. If you would like to obtain gene counts, I'd suggest to use e.g. STAR + featureCounts.

Best Andrey

asan-emirsaleh commented 1 year ago

Hi! Thank you for response. As for the blast, there is some reason behind providing pre-build blast database. In some cluster setup such as ours the newest blast major release with makeblastdb working is 2.9. Setting blast to 2.9 causes busco downgrade in conda environment. It is impossible to se both blast= 2.9 and busco=5, because of version conflict error appearing. While using open database-deposited data for the reference purpose, the putative transcriptome is often already known.