Closed harrychown closed 1 year ago
Hi @harrychown, what version of DIAMOND are you using? corrected bitscore has be available since release 2.1.0: https://github.com/bbuchfink/diamond/discussions/646. The diamond developer describes the inspiration for this method in this comment
In prehgt, we return the corrected bitscores using this command:
diamond blastp --db ${input_db} --query ${input_aa_rep_seq} --out ${prefix}_vs_clustered_nr.tsv \
--outfmt 6 qseqid qtitle sseqid stitle pident approx_pident length mismatch gapopen qstart qend qlen qcovhsp sstart send slen scovhsp evalue bitscore score corrected_bitscore \
--max-target-seqs 100 --threads $task.cpus --faster
where the arguments following --outfmt 6
specify which columns are returned.
https://github.com/Arcadia-Science/prehgt/blob/main/modules/blastp_against_clustered_nr.nf#L20C1-L22C60
In diamond, corrected_bitscore
appears twice in the current code base:
In the prehgt paper, I used the orthofinder citation conceptually in reference to Figure 1, which shows the length-dependency of bitscores: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0721-2.
Hi @taylorreiter Thank you so much for your quick response! It appears that I am using an earlier DIAMOND version 2.0.13 which is the cause for my troubles. I'll update and re-run. Thank you also for providing further information on the calculation of the bitscores, I really appreciate it. Best, Harry
Hi, I'm trying to run parts of your code outside of a Nextflow pipeline. When running DIAMOND I notice that there is no option for generating a corrected bitscore. Looking at the documentation, it seems that the bitscore is corrected based on gene-length. As a citation you used OrthoFinder and their methodology for generating a corrected bitscore, however I do
Do you have a method for generating the corrected bitscore?
Many thanks,
Harry