MannLabs / alphadia

modular & open DIA search
https://alphadia.readthedocs.io
Apache License 2.0
41 stars 3 forks source link

CHORE updated configuration of directLFQ #128

Closed ammarcsj closed 5 months ago

ammarcsj commented 5 months ago

I added a few calls to the directlfq config in order to speed up and also handle the read-only bug with pandas which occurs on some linux+pandas combinations.

GeorgWa commented 5 months ago

Nice, thank you!

What is the effect of check_wether_to_copy_numpy_arrays_derived_from_pandas(), Is there a downside/cost of assuming read-only pandas columns?

ammarcsj commented 5 months ago

It would be more memory heavy, as I currently operate on the whole quantitative matrix, so I would have to create a new normalized matrix of the same size. With this new check I only do this when the bug appears.

GeorgWa commented 5 months ago

Ah okay, makes sense. Would it be possible to pass a numpy matrix instead of a pandas df or is there functionality relying on pandas?

ammarcsj commented 5 months ago

I use pandas to read in the whole matrix and then I use .to_numpy() in order to convert it to numpy. And then this numpy array is read only sometimes. I cannot replace the reading in with pandas. But maybe I'm overlooking something?

GeorgWa commented 5 months ago

I think the challenge is that you have to keep the pandas and the numpy representation. I guess it would be fine to call .to_numpy().copy() if we could throw away the pandas version but we still have to keep it for the other columns.

I'm just wondering how you deal with it as I'm having similar challenges. I copy all the time which is even more inefficient I guess.

ammarcsj commented 5 months ago

Indeed, so I did the same thing as you, the only difference that I have .copy(GLOBAL_FLAG) and the flag will be false if the read only problem does not exist. Would be interesting to quantify the cost of the .copy

Another problem is that if you have a really huge array in memory, the .copy() will still double it, even if you throw away the original df right away?