Arcadia-Science / prehgt

A pipeline for lightweight screening of Eukaryotic genomes and transcriptomes for recent HGT
MIT License
12 stars 6 forks source link

Missing corrected bitscores #53

Closed harrychown closed 1 year ago

harrychown commented 1 year ago

Hi, I'm trying to run parts of your code outside of a Nextflow pipeline. When running DIAMOND I notice that there is no option for generating a corrected bitscore. Looking at the documentation, it seems that the bitscore is corrected based on gene-length. As a citation you used OrthoFinder and their methodology for generating a corrected bitscore, however I do

Do you have a method for generating the corrected bitscore?

Many thanks,

Harry

taylorreiter commented 1 year ago

Hi @harrychown, what version of DIAMOND are you using? corrected bitscore has be available since release 2.1.0: https://github.com/bbuchfink/diamond/discussions/646. The diamond developer describes the inspiration for this method in this comment

In prehgt, we return the corrected bitscores using this command:

diamond blastp --db ${input_db} --query ${input_aa_rep_seq} --out ${prefix}_vs_clustered_nr.tsv \
    --outfmt 6 qseqid qtitle sseqid stitle pident approx_pident length mismatch gapopen qstart qend qlen qcovhsp sstart send slen scovhsp evalue bitscore score corrected_bitscore \
    --max-target-seqs 100 --threads $task.cpus --faster

where the arguments following --outfmt 6 specify which columns are returned. https://github.com/Arcadia-Science/prehgt/blob/main/modules/blastp_against_clustered_nr.nf#L20C1-L22C60

In diamond, corrected_bitscore appears twice in the current code base:

In the prehgt paper, I used the orthofinder citation conceptually in reference to Figure 1, which shows the length-dependency of bitscores: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0721-2.

harrychown commented 1 year ago

Hi @taylorreiter Thank you so much for your quick response! It appears that I am using an earlier DIAMOND version 2.0.13 which is the cause for my troubles. I'll update and re-run. Thank you also for providing further information on the calculation of the bitscores, I really appreciate it. Best, Harry