fhcrc / deenurp

16S rRNA gene sequence curation and phylogenetic reference set creation
GNU General Public License v3.0
4 stars 3 forks source link

externally-defined criteria for keeping/dropping records #57

Open nhoffman opened 7 years ago

nhoffman commented 7 years ago

Let's add an option 'filter_outliers --filter-functions' that will allow the user to specify a file containing python code for some (optional) functions, each with the same signature: func(seq, info) where seq is a single sequence record, and info represents annotation (a dict) for the corresponding sequence. One or more of the following functions may be defined:

These override other criteria that are applied before outlier detection (so that we have more fine-grained control of what sequences are considered for outlier detection).

These override the results of outlier detection.

Comments? Questions?