bokulich-lab / RESCRIPt

REference Sequence annotation and CuRatIon Pipeline
BSD 3-Clause "New" or "Revised" License
89 stars 26 forks source link

add super LCA and majority LCA merging #59

Closed nbokulich closed 4 years ago

nbokulich commented 4 years ago

merge_taxa has two new modes:

  1. majority performs LCA merging, with priority to majority labels
  2. super performs LCA merging, with priority to majority labels. Labels that are substrings of other labels (including empty rank handles) are collapsed prior to majority/LCA.

In general also includes a decent code refactor and addition of several more tests.

mikerobeson commented 4 years ago

This looks really great @nbokulich! My only concerns are really the help text, of "super". Otherwise everything seems to run well locally, all tests pass.

nbokulich commented 4 years ago

Thanks @mikerobeson ! I fixed the description text with your recommendation.

Ultimately, I think usage will be best demonstrated with a detailed tutorial, on the to-do list. For now hopefully it is clear enough to guide users!

We may also want to consider what mode to use as the default for this method. For now I have kept as len (choose the taxonomy with the most elements) but super sort of has the best of all worlds (hence the name) so we may want to promote that.

mikerobeson commented 4 years ago

No problem @nbokulich, I am leaning towards the super approach as the default. They way I look at it, we are making use of the most information available. I agree about clarifying details more within a tutorial.