andreyto / mr-mpi-blast

Parallel implementation of NCBI BLAST+ with MapReduce-MPI
http://andreyto.github.com/mgtaxa/
Other
8 stars 0 forks source link

Confirm that use of BLAST's `-max_target_seqs` is intentional #4

Closed armish closed 6 years ago

armish commented 6 years ago

Hi there,

This is a semi-automated message from a fellow bioinformatician. Through a GitHub search, I found that the following source files make use of BLAST's -max_target_seqs parameter:

Based on the recently published report, Misunderstood parameter of NCBI BLAST impacts the correctness of bioinformatics workflows, there is a strong chance that this parameter is misused in your repository.

If the use of this parameter was intentional, please feel free to ignore and close this issue but I would highly recommend to add a comment to your source code to notify others about this use case. If this is a duplicate issue, please accept my apologies for the redundancy as this simple automation is not smart enough to identify such issues.

Thank you! -- Arman (armish/blast-patrol)

andreyto commented 6 years ago

Dear @armish, I appreciate your attracting attention to the danger of incorrect use of max_target_seqs by mass-posting issues on GitHub and linking to the recent publication. I am sure this will attract the attention to this paper that it deserves. However, you might want to slightly modify your bulk-mailing script and make it to filter by the value supplied for the max_target_seqs parameter. You will discover that in all cases listed by you for our code, we set this value to be above two billion (max 32 bit signed integer value). I hope you agree that with the exception of some very unusual search objectives, a two billion limit for this parameter should be perfectly safe.

armish commented 6 years ago

Sorry about that @andreyto -- I wasn't sure about a good threshold to filter out potentially intentional cases so I erred on the side of being over aggressive. I appreciate your understanding with this and again my apologies for these multiple false positive hits.

FWIW: the paper is not mine but I found it as an opportunity to broadcast to all relevant people for the sake of science (and for the joy of weekend hacking).

All the best, -- Arman