Closed jolespin closed 3 years ago
thank you for the suggestion. I will see what can be done to accommodate this as I think it would help people with the interpretation of their experiments.
Thanks for considering it. I wish I could help but I only know enough R to run tools like yours (I code in Python mainly). Having this type of reference would also make it easier to interpret volcano plots generated using ALDEx2 data since the reference could be fixed.
Can you point me into the direction of which functions and scripts I would need to adapt? I'd like to hack this up as I'm using it for a few different projects and would find the interpretability extremely useful. If I'm able to do it correctly, I would, of course, submit a PR so other people could take advantage of this functionality as well.
Sure thing. It is the aldex.effect function that you want to play with. Make sure you use the ALDEx_bioc repository as your starting point as this is the production site
clr_effect.r file
Apologies, I tried digging into the code but the pace of a few projects have picked up quite a bit. I'm not great with R so making the proper edits was a bit more difficult than I thought.
As I am using ALDEx2 for my own projects, I ended up writing a Python wrapper for ALDEx2: https://github.com/jolespin/soothsayer/blob/5b0ace5687866d81acdf8d7dfbecdb49f843d2f8/soothsayer/r_wrappers/packages/ALDEx2.py#L36
def run_aldex2(X:pd.DataFrame, y:pd.Series, reference_class, into=pd.DataFrame, aldex2_kws=dict(), random_state=0, show_console=False):
...
return _run(X=X, y=y, reference_class=reference_class, kws=_aldex2_kws)
and then a wrapper of a wrapper: https://github.com/jolespin/soothsayer/blob/5b0ace5687866d81acdf8d7dfbecdb49f843d2f8/soothsayer/statistics/statistics.py#L185
def differential_abundance(X:pd.DataFrame, y:pd.Series, reference_class=None, design_matrix=None, method="ALDEx2", into=pd.DataFrame, algo_kws=dict(), random_state=0, show_console=False, **kwargs):
...
return ALDEx2.run_aldex2(**kwargs)
My solution to the reference problem was by only allowing 2 conditions, storing the actual class names, making the reference condition "reference" and the treatment condition "treatment" so it follows the alphabetical order method you set up, and then after it's done with the analysis I replace the labels once more. This seems to work in the meantime.
I also encountered the same issue but in aldex.glm
modules.
As I am comparing between multiple groups of samples and I am interested in pairwise comparisons, the number of times I need to run aldex.glm
modules(e.g n times for n groups) is much less than that if I run aldex.effect
modules pair by pair (nC2 times for n groups). However, both model.matrix
and aldex.glm
functions are really stubborn in the choice of reference factor, eventually I come up with the same idea with @jolespin that I have to externally generate multiple model matrices with the reference group named as 'A'.
It sounds dumb, and is a pain to toggle the conditions, so i absolutely look forward to a function like this for both aldex.glm
and aldex.effect
modules!
As a replacement method for DEseq2, I do see better clustering and resolution to my data using ALDEx2, so hopefully the reference options will become as convenient as the contrast
arguments in DEseq soon~
It would be useful to the reference condition so we can control the directionality of the
dif.btw
. For example, if we were comparingtreatment
vs.control
we could specifyreference='control'
and the resulting statistics would be usingcontrol
as the reference.If there was something like this:
Then
diff.btw
would be-3
since it would be usingcontrol
as the reference. I think right now it's in alphabetical order.The usage could be like this: