BojarLab / glycowork

Package for processing and analyzing glycans and their role in biology.
https://Bojarlab.github.io/glycowork
MIT License
55 stars 11 forks source link

Deprecated pandas-related code in get_differential_expression() + unintuitive error #50

Closed mattias-erhardsson closed 2 weeks ago

mattias-erhardsson commented 2 weeks ago

Running the latest version of pandas (2.2.2) and the Dev branch on glycowork (commit 6f81f00), there are two errors when running the following code:

import pandas as pd
from glycowork.motif.analysis import get_differential_expression

data = {
    'Glycan': ['Gal(b1-3)GalNAc', 'GalOS(b1-3)GalNAc', 'Gal(b1-3)[Fuc(a1-?)]GalNAc'],
    'Sample1': [1.1, 0.2, 0.3],
    'Sample2': [1.2, 0.1, 0.2],
    'Sample3': [0.1, 1.8, 1.9],
    'Sample4': [0.2, 1.1, 1.2]
}
differential_glycomics_df = pd.DataFrame(data)

group1 = ['Sample1', 'Sample2']
group2 = ['Sample3', 'Sample4']

get_differential_expression(df = differential_glycomics_df,
                            group1 = group1,
                            group2 = group2,
                            motifs = True,
                            feature_set = ['exhaustive'],
                            paired = False,
                            min_samples = 0.1)

The first error seems related to deprecated code related to pandas. The second error is some kind of divide by 0 error with challenging interpretability.

C:\Users\xerhma\AppData\Local\Programs\Python\Python312\Lib\site-packages\glycowork\glycan_data\stats.py:696: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set pd.set_option('future.no_silent_downcasting', True) row = row.fillna(nan_placeholder)

C:\Users\xerhma\AppData\Local\Programs\Python\Python312\Lib\site-packages\scipy\stats_morestats.py:3345: RuntimeWarning: divide by zero encountered in scalar divide W = numer / denom

Bribak commented 2 weeks ago

Thanks! It's weird, I don't get the first deprecation warning, even when upgrading to pandas 2.2.2 (but I do know about it). So I can't really do a thorough fix but I've fixed the row that was highlighted in the warning. Leaving this issue open until you confirm that you don't see the warning anymore

For the Runtime warning: Levene's test (for homogeneity of variances) really does not like having fewer than three samples per group:D Calculations get wonky otherwise. I've now disabled it when there are fewer than three samples in a group

Both are in 8107a64

mattias-erhardsson commented 2 weeks ago

Both are fixed now, thank you =). I guess there would be few real-world examples where a user might attempt differential glycomics with just 2 samples in each group, but it could happen.