ggloor / ALDEx2_dev

ALDEx tool to examine compositional high-throughput sequence data with Welch's t-test
GNU Affero General Public License v3.0
12 stars 6 forks source link

Feature Request: Specify a "reference" condition #21

Closed jolespin closed 3 years ago

jolespin commented 3 years ago

It would be useful to the reference condition so we can control the directionality of the dif.btw. For example, if we were comparing treatment vs. control we could specify reference='control' and the resulting statistics would be using control as the reference.

If there was something like this:

rab.win.treatment  = 2
rab.win.control  = 5

Then diff.btw would be -3 since it would be using control as the reference. I think right now it's in alphabetical order.

The usage could be like this:

aldex(X, y, reference='control')
ggloor commented 3 years ago

thank you for the suggestion. I will see what can be done to accommodate this as I think it would help people with the interpretation of their experiments.

jolespin commented 3 years ago

Thanks for considering it. I wish I could help but I only know enough R to run tools like yours (I code in Python mainly). Having this type of reference would also make it easier to interpret volcano plots generated using ALDEx2 data since the reference could be fixed.

jolespin commented 3 years ago

Can you point me into the direction of which functions and scripts I would need to adapt? I'd like to hack this up as I'm using it for a few different projects and would find the interpretability extremely useful. If I'm able to do it correctly, I would, of course, submit a PR so other people could take advantage of this functionality as well.

ggloor commented 3 years ago

Sure thing. It is the aldex.effect function that you want to play with. Make sure you use the ALDEx_bioc repository as your starting point as this is the production site

ggloor commented 3 years ago

clr_effect.r file

jolespin commented 3 years ago

Apologies, I tried digging into the code but the pace of a few projects have picked up quite a bit. I'm not great with R so making the proper edits was a bit more difficult than I thought.

As I am using ALDEx2 for my own projects, I ended up writing a Python wrapper for ALDEx2: https://github.com/jolespin/soothsayer/blob/5b0ace5687866d81acdf8d7dfbecdb49f843d2f8/soothsayer/r_wrappers/packages/ALDEx2.py#L36

def run_aldex2(X:pd.DataFrame, y:pd.Series, reference_class, into=pd.DataFrame, aldex2_kws=dict(), random_state=0, show_console=False):
...
        return _run(X=X, y=y, reference_class=reference_class, kws=_aldex2_kws)

and then a wrapper of a wrapper: https://github.com/jolespin/soothsayer/blob/5b0ace5687866d81acdf8d7dfbecdb49f843d2f8/soothsayer/statistics/statistics.py#L185

def differential_abundance(X:pd.DataFrame, y:pd.Series, reference_class=None, design_matrix=None, method="ALDEx2", into=pd.DataFrame, algo_kws=dict(), random_state=0, show_console=False, **kwargs):

...
        return ALDEx2.run_aldex2(**kwargs)

My solution to the reference problem was by only allowing 2 conditions, storing the actual class names, making the reference condition "reference" and the treatment condition "treatment" so it follows the alphabetical order method you set up, and then after it's done with the analysis I replace the labels once more. This seems to work in the meantime.

tsa4a12 commented 2 years ago

I also encountered the same issue but in aldex.glm modules.

As I am comparing between multiple groups of samples and I am interested in pairwise comparisons, the number of times I need to run aldex.glm modules(e.g n times for n groups) is much less than that if I run aldex.effect modules pair by pair (nC2 times for n groups). However, both model.matrix and aldex.glm functions are really stubborn in the choice of reference factor, eventually I come up with the same idea with @jolespin that I have to externally generate multiple model matrices with the reference group named as 'A'.

It sounds dumb, and is a pain to toggle the conditions, so i absolutely look forward to a function like this for both aldex.glm and aldex.effect modules! As a replacement method for DEseq2, I do see better clustering and resolution to my data using ALDEx2, so hopefully the reference options will become as convenient as the contrast arguments in DEseq soon~