Closed Robaina closed 2 years ago
Currently considering these options to deal with multiple placement locations for a single query:
Check whether all placement locations lay within a predefined cluster. In that case, results won't be affected since they are cluster-dependent (taxonomy assignment by gappa examine assgin would prob. have to be re-evaluated)
Test gappa's subcommand accumulate (https://github.com/lczech/gappa/wiki/Subcommand:-accumulate)
NOTE: these measures are independent of filtering placements by placement distance
SOME NOTES:
Alright, so filtering by pendant length already implemented... there are three options:
Tree diameter: computed as maximum distance between any two pairs of leaves (tips) in the tree
For a description of distal and pendant length see: https://github.com/lczech/gappa/wiki/Subcommand%3A-assign
ANOTHER NOTE
gappa version >= 0.8.0 for newest code to work. Latest gappa version available in conda: https://anaconda.org/bioconda/gappa.
However, I couldn't get the latest version from the install command. Instead, I had to download the file from here: https://anaconda.org/bioconda/gappa/files and install directly (conda install "filename")
Ok, this issue fixed by PR #73
Reopening this issue because we realized that filtering out placements which were assigned to more than one cluster should be a default behavior of labelplacements.py, otherwise cluster scores are not properly assigned in those cases (at least impose this behavior when cluster_scores are provided.
The next PR will include this filter by default
Implement filter to remove placements located too far away from insertion point / closest reference sequences in the tree