drostlab / metablastr

Seamless Integration of BLAST Sequence Searches in R
https://drostlab.github.io/metablastr/
GNU General Public License v2.0
31 stars 8 forks source link

Feature Request: exclude sequences with Ns #3

Open avancise opened 5 years ago

avancise commented 5 years ago

In extract_random_seqs_from_genome(), It would be helpful to have an option that allows users to decide whether to exclude sequences with too many Ns (e.g. N > 0 or N > 10%). For me, it would be fine for this filtering step to happen after X sequences are drawn (e.g. if 100 sequences are drawn, then 10 are excluded because they have too many Ns, resulting in 90 sequences). It would be great to have a short printout at the end that says how many sequences were drawn and how many were filtered out due to an issue with Ns.