AgResearch / data_prism

GNU General Public License v3.0
1 stars 0 forks source link

Confirm that use of BLAST's `-max_target_seqs` is intentional #2

Closed armish closed 6 years ago

armish commented 6 years ago

Hi there,

This is a semi-automated message from a fellow bioinformatician. Through a GitHub search, I found that the following source files make use of BLAST's -max_target_seqs parameter:

Based on the recently published report, Misunderstood parameter of NCBI BLAST impacts the correctness of bioinformatics workflows, there is a strong chance that this parameter is misused in your repository.

If the use of this parameter was intentional, please feel free to ignore and close this issue but I would highly recommend to add a comment to your source code to notify others about this use case. If this is a duplicate issue, please accept my apologies for the redundancy as this simple automation is not smart enough to identify such issues.

Thank you! -- Arman (armish/blast-patrol)

afmcc commented 6 years ago

Our use of this parameter in these examples was intentional, and did not involve biologically interpreting the "top hit" returned; rather, (in this example) "top hits" from a random selection of sequences in a number (cumulative) of files, are used to prepare a numeric profile vector for each file, with the semantic details of the hits discarded. (The numeric vectors are then input to unsupervised machine learning - for example clustered

I will annotate our code to this effect.