Closed armish closed 6 years ago
Dear @armish, I appreciate your attracting attention to the danger of incorrect use of max_target_seqs
by mass-posting issues on GitHub and linking to the recent publication. I am sure this will attract the attention to this paper that it deserves.
However, you might want to slightly modify your bulk-mailing script and make it to filter by the value supplied for the max_target_seqs
parameter. You will discover that in all cases listed by you for our code, we set this value to be above two billion (max 32 bit signed integer value). I hope you agree that with the exception of some very unusual search objectives, a two billion limit for this parameter should be perfectly safe.
Sorry about that @andreyto -- I wasn't sure about a good threshold to filter out potentially intentional cases so I erred on the side of being over aggressive. I appreciate your understanding with this and again my apologies for these multiple false positive hits.
FWIW: the paper is not mine but I found it as an opportunity to broadcast to all relevant people for the sake of science (and for the joy of weekend hacking).
All the best, -- Arman
Hi there,
This is a semi-automated message from a fellow bioinformatician. Through a GitHub search, I found that the following source files make use of BLAST's
-max_target_seqs
parameter:Based on the recently published report, Misunderstood parameter of NCBI BLAST impacts the correctness of bioinformatics workflows, there is a strong chance that this parameter is misused in your repository.
If the use of this parameter was intentional, please feel free to ignore and close this issue but I would highly recommend to add a comment to your source code to notify others about this use case. If this is a duplicate issue, please accept my apologies for the redundancy as this simple automation is not smart enough to identify such issues.
Thank you! -- Arman (armish/blast-patrol)