Open asan-emirsaleh opened 1 year ago
Dear @asan-emirsaleh
Sorry for such a long response, rnaQUAST is now only occasionally maintained as some of authors left the lab.
rnaQUAST uses gmap to map contigs to the genome, and BLAST to map contigs to the transcriptome. This way it allows to accurately detect misassemblies, e.g. chimeric contigs reported by both ways. Unfortunately, there is no option to pass existing blast database since rnaQUAST creates transcriptome FASTA by itself based on the annotation. You can also send me command line / log file for check.
I also suggest not to obtain database coverage by reads as it was implemented in a quite inefficient way. If you would like to obtain gene counts, I'd suggest to use e.g. STAR + featureCounts.
Best Andrey
Hi! Thank you for response. As for the blast, there is some reason behind providing pre-build blast database. In some cluster setup such as ours the newest blast major release with makeblastdb working is 2.9. Setting blast to 2.9 causes busco downgrade in conda environment. It is impossible to se both blast= 2.9 and busco=5, because of version conflict error appearing. While using open database-deposited data for the reference purpose, the putative transcriptome is often already known.
Hello! I used Quast for times and know I am trying to use the rnaQUAST. Both tools are mentioned as great and robust quality assessment techniques. Some thing are not clear enough to me to use rnaQUAST effectively. As of ordinary alignment procedures takes days until complete, it would be a good idea first to prepare alignment files before the pipeline started and pass them as the input. Are these right:
-sam
is a parameter to pass the reads' alignment to the reference genome. From theSAM
file the alignments data only would be used, but not the read data.BAM
format is not accepted. For reproducibility purposes, theSTAR
aligner with default parameters is used.--left_reads
and--right_reads
parameter are used to pass the read data, so the reads would be aligned to the transcriptome assessed by theSTAR
aligner. Currently there is no way to pass the previously preparedSAM
file as input. Also the read data would be used to align to the genome and compute mapping metrics. For this kind of analysis, the-sam
parameter might be used to speed-up the computation runtime.--reference
is used to pass the reference genome data. Currently there is no option to pass the predicted transcriptome sequences.--gtf
parameter is used to pass the gene coordinates of predicted transcripts in reference genome. BothGTF
andGFF
files are acceptable. This data would be used for gffutils to produce gene databases.--gmap_index
is used to pass the index of the reference genome, that would be used to align the transcriptome on assessing to the reference genome.-psl
is used to pass thePSL
file produced by aligning transcriptome on assessing to the reference genome usingBLAT
aligner. There is no option to pass the prebuild BLAT database.-meta
option is used to assess some metrics dedicated to metatranscriptome assembles. But this option is not documented in the manual page.The one thing is also not clear for me. What the
BLAST
aligner is used for? And what is the reason of building the blast databases? Are there an option to pass the prebuild one?Best regards Asan