Robaina / MetaTag

metaTag: functional and taxonomical annotation of metagenomes through phylogenetic tree placement
https://robaina.github.io/MetaTag/
Apache License 2.0
1 stars 0 forks source link

Filter placements by placement distance + Deal with multiple placement locations for a single query #70

Closed Robaina closed 2 years ago

Robaina commented 2 years ago

Implement filter to remove placements located too far away from insertion point / closest reference sequences in the tree

Robaina commented 2 years ago

Currently considering these options to deal with multiple placement locations for a single query:

  1. Check whether all placement locations lay within a predefined cluster. In that case, results won't be affected since they are cluster-dependent (taxonomy assignment by gappa examine assgin would prob. have to be re-evaluated)

  2. Test gappa's subcommand accumulate (https://github.com/lczech/gappa/wiki/Subcommand:-accumulate)

NOTE: these measures are independent of filtering placements by placement distance

Robaina commented 2 years ago

SOME NOTES:

  1. gappa examine assign takes into account pendant length when passing "--distant-label" in version v.0.8.0 (default in previous versions). DISTANT is a new label that gets LWR assign proportionally based on the pendant length. Thus we could filter out placements based on the LWR assigned to DISTANT (above a given percentage threshold would imply query is not well defined/removed from counting)
  2. We could filter out placements based on pendant length directly in the .jplace file. We could set a threshold based on ratio to tree diameter (max distance between two leaves)
Robaina commented 2 years ago

Alright, so filtering by pendant length already implemented... there are three options:

  1. Filter directly my maximum pendant length
  2. Filter by maximum pendant to distal length ratio
  3. Filter by maximum pendant to tree diameter ratio

Tree diameter: computed as maximum distance between any two pairs of leaves (tips) in the tree

For a description of distal and pendant length see: https://github.com/lczech/gappa/wiki/Subcommand%3A-assign

Robaina commented 2 years ago

ANOTHER NOTE

gappa version >= 0.8.0 for newest code to work. Latest gappa version available in conda: https://anaconda.org/bioconda/gappa.

However, I couldn't get the latest version from the install command. Instead, I had to download the file from here: https://anaconda.org/bioconda/gappa/files and install directly (conda install "filename")

Robaina commented 2 years ago

Ok, this issue fixed by PR #73

Robaina commented 2 years ago

Reopening this issue because we realized that filtering out placements which were assigned to more than one cluster should be a default behavior of labelplacements.py, otherwise cluster scores are not properly assigned in those cases (at least impose this behavior when cluster_scores are provided.

The next PR will include this filter by default