BojarLab / glycowork

Package for processing and analyzing glycans and their role in biology.
https://Bojarlab.github.io/glycowork
MIT License
55 stars 11 forks source link

Biodiversity functions #42

Closed Glycocalex closed 2 months ago

Glycocalex commented 3 months ago

Glycowork.stats:

Bribak commented 3 months ago

Thanks! We're almost there:D There are no merge conflicts but the build currently fails the tests because the test for get_biodiversity expects group1/group2 args. I get the advantage of having an ANOVA-like interface but, for the sake of convenience (as pairwise comparison is a more common usecase), I'd like to keep the binary group interface. Slightly hacky but one solution could be to assume, if group2 is [], that group1 contains an ANOVA-like input with more than two groups

Also a few notes that I'll do after merging (but feel free to take a stab if you'd like):

Glycocalex commented 2 months ago

So I think this pull request is automatically updating when I push to my branch. If so, things should be shaping up.

We have group1/group2 arguments, where the columns for each group are defined in a list. I also liked your suggestion about using an empty list for group2 to allow multiple group comparisons, so I've put another if-statement to facilitate that.

I've done the homogenisation points you mentioned, and tried to clean up a few unnecessary lines. But it's a pretty messy function so you might find other ways to condense it.

I haven't changed the alpha/beta flag to a shopping cart yet. Happy to have a crack at this if it's something that requires a bit of time. (Does this essentially mean we can do both instead of doing either one?)

Bribak commented 2 months ago

Thanks & yes, the PR is automatically updating. That fixes all of the previous points.

The tests currently fail because the function seems to modify the input dataframe (so when downstream tests try to use the test dataframe it is no longer formatted as expected). Specifically, the glycan column that is popped off at the beginning is not reattached if motifs=False (and .pop modifies the original dataframe)

At minimum, the glycan column needs to be re-added also when motifs=False. Ideally, however, I'd change the processing such that popping & re-adding is not required (check out the early chunk of get_differential_expression to see how this could be avoided), because otherwise several things would need to be adjusted (e.g., get_alphaN only needs to do df.shape[1]-1 if the glycan column is still in the dataframe)

If it works on your machine, try running nbdev_prepare locally in a command line window to check whether the tests run without errors (warnings are okay)

Glycocalex commented 2 months ago

Okay, took a little while to get this sorted because nbdev doesn't work on windows, but I've added the shopping cart and ensured the input df isn't altered.

My nbdev_prepare test returned a success with some warnings. I can send them to you if there are usually no warnings when you run prepare.