Trinotate / Trinotate.github.io

web documentation for Trinotate
47 stars 17 forks source link

Potential issue with BLAST -max_target_seqs #9

Open danwiththeplan opened 5 years ago

danwiththeplan commented 5 years ago

Hi, this pre-print has been pointed out to me:

https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/bty833/5106166

It implies that NCBI-BLAST searches using the -max_target_seqs parameter do not return the best hit. Does this have implications for Trinotate? Particularly this bit?

https://github.com/Trinotate/Trinotate.github.io/wiki/Software-installation-and-data-required#blast-commands

brianjohnhaas commented 5 years ago

Yes, it most certainly does have implications here. But because we use a reasonably stringent e-value threshold, I expect the consequences to be minimal.

On Thu, Sep 27, 2018 at 7:17 PM Dan Jones notifications@github.com wrote:

Hi, this pre-print has been pointed out to me:

https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/bty833/5106166

It implies that BLAST searches using the -max_target_seqs parameter do not return the best hit. Does this have implications for Trinotate? Particularly this bit?

https://github.com/Trinotate/Trinotate.github.io/wiki/Software-installation-and-data-required#blast-commands

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Trinotate/Trinotate.github.io/issues/9, or mute the thread https://github.com/notifications/unsubscribe-auth/AHMVX0gd6ZbosraRxRBerDgfQwGAYSbsks5ufVyZgaJpZM4W9m1J .

--

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas

danwiththeplan commented 5 years ago

@brianjohnhaas good to know.

My understanding of the manual suggests using -max-hsps 1 will give you the best hit, but I'm not trusting anything about NCBI-BLAST without checking it which I haven't done yet.

max_hsps | integer | none | Maximum number of HSPs (alignments) to keep for any single query-subject pair. The HSPs shown will be the best as judged by expect value. This number should be an integer that is one or greater. If this option is not set, BLAST shows all HSPs meeting the expect value criteria. Setting it to one will show only the best HSP for every query-subject pair

brianjohnhaas commented 5 years ago

It's unsettling. Everyone's free to use whatever parameters they want, though, and Trinotate should hopefully regurgitate the results accordingly. (just needs to be in outfmt 6)

On Thu, Sep 27, 2018 at 8:10 PM Dan Jones notifications@github.com wrote:

@brianjohnhaas https://github.com/brianjohnhaas good to know.

My understanding of the manual https://www.ncbi.nlm.nih.gov/books/NBK279684/ suggests using -max-hsps 1 will give you the best hit, but I'm not trusting anything about NCBI-BLAST without checking it which I haven't done yet.

max_hsps | integer | none | Maximum number of HSPs (alignments) to keep for any single query-subject pair. The HSPs shown will be the best as judged by expect value. This number should be an integer that is one or greater. If this option is not set, BLAST shows all HSPs meeting the expect value criteria. Setting it to one will show only the best HSP for every query-subject pair

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Trinotate/Trinotate.github.io/issues/9#issuecomment-425280477, or mute the thread https://github.com/notifications/unsubscribe-auth/AHMVX-XP58VJPKPrafGN_gunr3xL8olIks5ufWjhgaJpZM4W9m1J .

--

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas

danwiththeplan commented 5 years ago

Realistically this should result in thousands of papers being corrected or outright retracted. Try searching github for where this line was used.. thousands of results, all potentially affected even if in a minor way. Unsettling indeed!

Thijmen18 commented 5 years ago

Hi,

Have you all seen the response also?

https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/bty1026/5259186

brianjohnhaas commented 5 years ago

Looks like it's behind a paywall. For $45 USD, you can have 24 hour access to it.

On Sat, Feb 9, 2019 at 6:43 AM Thijmen18 notifications@github.com wrote:

Hi,

Have you all seen the response also?

https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/bty1026/5259186

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Trinotate/Trinotate.github.io/issues/9#issuecomment-462037387, or mute the thread https://github.com/notifications/unsubscribe-auth/AHMVX66PY08E71Lu0uG9U0pg3L8nRRH_ks5vLrRHgaJpZM4W9m1J .

--

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas