BojarLab / glycowork

Package for processing and analyzing glycans and their role in biology.
https://Bojarlab.github.io/glycowork
MIT License
57 stars 12 forks source link

get_pca() errors and unable to input groups as df #48

Closed mattias-erhardsson closed 5 months ago

mattias-erhardsson commented 5 months ago

Description get_pca() throws errors that indicate parts of the code use deprecated code.

get_pca() do not seem to be able to use a groups df according to the documentation in order to draw colors and shapes. It seems like it is limited to only using a list to draw color.

Environment

Code for deprecated code

import pandas as pd
from glycowork.motif.analysis import get_pca
# Sample data
data = {
    'Glycan': ['Gal(b1-3)GalNAc', 'GalOS(b1-3)GalNAc', 'Gal(b1-3)[Fuc(a1-?)]GalNAc', 'Fuc(a1-2)Gal(b1-3)GalNAc', 'Fuc(a1-?)[HexNAc(?1-?)]GalNAc'],
    'Sample1': [0.5, 0.3, 0.2, 0.7, 0.9],
    'Sample2': [0.4, 0.2, 0.3, 0.6, 0.8],
    'Sample3': [0.3, 0.1, 0.4, 0.5, 0.7]
}

# Create a DataFrame
df = pd.DataFrame(data)

# Perform PCA
get_pca(df, groups=None, motifs=True, feature_set=['known', 'exhaustive'], pc_x=1, pc_y=2, color=None, shape=None, filepath='', custom_motifs=[], transform=None, rarity_filter=0.05)

Error for deprecated code

C:\Users\xerhma\AppData\Local\Programs\Python\Python312\Lib\site-packages\glycowork\motif\annotate.py:406: FutureWarning: DataFrame.groupby with axis=1 is deprecated. Do frame.T.groupby(...) without axis instead. out_matrix = out_matrix.groupby(by = out_matrix.columns, axis = 1).sum()

Code for bugged groups input

# Sample data
data = {
    'Glycan': ['Gal(b1-3)GalNAc', 'GalOS(b1-3)GalNAc', 'Gal(b1-3)[Fuc(a1-?)]GalNAc', 'Fuc(a1-2)Gal(b1-3)GalNAc', 'Fuc(a1-?)[HexNAc(?1-?)]GalNAc'],
    'Sample1': [0.5, 0.3, 0.2, 0.7, 0.9],
    'Sample2': [0.4, 0.2, 0.3, 0.6, 0.8],
    'Sample3': [0.3, 0.1, 0.4, 0.5, 0.7]
}

# Create a DataFrame
df = pd.DataFrame(data)

# Sample groups data
groups_data = {
    'id': ['Sample1', 'Sample2', 'Sample3'],
    'Treatment': ['Control', 'Treatment', 'Control']
}

# Create groups DataFrame
groups_df = pd.DataFrame(groups_data)

# Perform PCA
get_pca(df, groups=groups_df, motifs=True, feature_set=['known', 'exhaustive'], pc_x=1, pc_y=2, color='Treatment', shape=None, filepath='', custom_motifs=[], transform=None, rarity_filter=0.05)

Error for bugged groups input

C:\Users\xerhma\AppData\Local\Programs\Python\Python312\Lib\site-packages\glycowork\motif\annotate.py:406: FutureWarning: DataFrame.groupby with axis=1 is deprecated. Do frame.T.groupby(...) without axis instead. out_matrix = out_matrix.groupby(by = out_matrix.columns, axis = 1).sum()

ValueError Traceback (most recent call last) ~\AppData\Local\Temp\ipykernel_30572\2247117151.py in ?() 18 # Create groups DataFrame 19 groups_df = pd.DataFrame(groups_data) 20 21 # Perform PCA ---> 22 get_pca(df, groups=groups_df, motifs=True, feature_set=['known', 'exhaustive'], pc_x=1, pc_y=2, color='Treatment', shape=None, filepath='', custom_motifs=[], transform=None, rarity_filter=0.05)

~\AppData\Local\Programs\Python\Python312\Lib\site-packages\glycowork\motif\analysis.py in ?(df, groups, motifs, feature_set, pc_x, pc_y, color, shape, filepath, custom_motifs, transform, rarity_filter) 497 # get pca 498 if motifs: 499 # Motif extraction and quantification 500 df = quantify_motifs(df.iloc[:, 1:], df.iloc[:, 0].values.tolist(), feature_set, custom_motifs = custom_motifs, remove_redundant = False).T.reset_index() --> 501 X = np.array(df.iloc[:, 1:len(groups)+1].T) if groups and isinstance(groups, list) else np.array(df.iloc[:, 1:].T) 502 scaler = StandardScaler() 503 X_std = scaler.fit_transform(X) 504 pca = PCA()

~\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\core\generic.py in ?(self) 1575 @final 1576 def nonzero(self) -> NoReturn: -> 1577 raise ValueError( 1578 f"The truth value of a {type(self).name} is ambiguous. " 1579 "Use a.empty, a.bool(), a.item(), a.any() or a.all()." 1580 )

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Code for successful drawing using list as groups input

# Sample data
data = {
    'Glycan': ['Gal(b1-3)GalNAc', 'GalOS(b1-3)GalNAc', 'Gal(b1-3)[Fuc(a1-?)]GalNAc', 'Fuc(a1-2)Gal(b1-3)GalNAc', 'Fuc(a1-?)[HexNAc(?1-?)]GalNAc'],
    'Sample1': [0.5, 0.3, 0.2, 0.7, 0.9],
    'Sample2': [0.4, 0.2, 0.3, 0.6, 0.8],
    'Sample3': [0.3, 0.1, 0.4, 0.5, 0.7]
}

# Create a DataFrame
df = pd.DataFrame(data)

# Create groups DataFrame
groups_df = pd.DataFrame(groups_data)

# Perform PCA
get_pca(df, groups = ['Control', 'Treatment', 'Control'], motifs=True, feature_set=['known', 'exhaustive'], pc_x=1, pc_y=2, shape=None, filepath='', custom_motifs=[], transform=None, rarity_filter=0.05)
Bribak commented 5 months ago

Thanks! I think the deprecation warning is something quite recent from pandas because I don't get it locally or in Colab. But should be fixed now!

For the bug: I switched around the type checks and now your example works fine; it will only check the truthiness of groups if it's sure to be a list now.

Both are fixed in a6d21ae