SegataLab / lefse

MIT License
76 stars 46 forks source link

reproducibility of results between github, galaxy instance and biobakery's LEfSe scripts #13

Open glucksfall opened 3 years ago

glucksfall commented 3 years ago

Dear developers,

Hope everything's ok.

I write because I'm having trouble trying to reproduce locally the results of the galaxy instance here https://huttenhower.sph.harvard.edu/galaxy/ (because I have a file bigger than the allowed size in the galaxy instance).

Firstly, I'm confused because the galaxy instance has the stricter all-against-all option (See image below), while the lefse_run.py --help reports that the stricter option is one-against-one, with no all-against-all option:

-y {0,1}        (for multiclass tasks) set whether the test is performed in a one-against-one ( 1 - more strict!) or in a one-against-all setting ( 0 - less strict) (default 0)

image

Secondly. after running LEfSe from galaxy and biobakery locally with python2.7 (ubuntu 20.04 and rpy2 compatible, R3.6.3) and from GitHub with python3, I have found that LDA scores differ subtly, but enough to drop one or two features below the threshold, while the galaxy instance reports them. I used abs(LDA) > 2. Do you have any clue about what interferes with reproducibility? I see that you set a random number seed here https://github.com/biobakery/galaxy_lefse/blob/2ca4bf39cbbe588b979873b234636670565b4caf/lefse.py#L9, but many other things can change things.

Finally, I don't know if you maintain the LEfSe versions at https://toolshed.g2.bx.psu.edu/, however, I installed both available versions in a local galaxy server and couldn't run Format Data for LEfSe because my local instance has python3 instead of python2. The biobakery's LEfSe for galaxy also needs python2 to run properly.

If you need more details or a better explanation, please don't hesitate and ask me.

Best regards

lauramason326 commented 2 years ago

Hi - I am actually having a similar issue comparing the GUI version of LefSe from https://huttenhower.sph.harvard.edu/galaxy/ and Lefse-1.1.2 on the command line. Like @glucksfall, I have similar LDA scores and mostly the same OTUs, but there are some differences. Do you know why this might be? Thanks Laura