BojarLab / glycowork

Package for processing and analyzing glycans and their role in biology.
https://Bojarlab.github.io/glycowork
MIT License
56 stars 11 forks source link

terminal1 combined with terminal2 doesn't work as feature set #54

Closed mattias-erhardsson closed 3 weeks ago

mattias-erhardsson commented 3 weeks ago

Similar to issue #53. When terminal 1 is combined with terminal2 a similar error happens. It seems to be specific to just this combination, I tried a few other combinations and they worked. I assume there's a similar problem with the logic, but I couldn't make sense of it when I looked at the code so i can't provide a code solution.

Example code below:

# Setup
from glycowork.motif.analysis import get_heatmap
import pandas as pd
data = {
    'Glycan': ['Gal(b1-3)GalNAc', 'GalOS(b1-3)GalNAc', 'Gal(b1-3)[Fuc(a1-?)]GalNAc', 'GlcNAc(b1-2)Man(a1-3)Man', 'Man(a1-6)[Man(a1-3)]Man(b1-4)GlcNAc(b1-4)GlcNAc', 'Neu5Ac(a2-6)Gal(b1-4)GlcNAc(b1-2)Man', 'Neu5Ac(a2-3)Gal(b1-3)GalNAc'],
    'Sample1': [1.1, 0.2, 0.3, 0.5, 0.7, 1.0, 0.6],
    'Sample2': [1.2, 0.1, 0.2, 0.4, 0.8, 0.9, 0.5],
    'Sample3': [0.1, 1.8, 1.9, 0.3, 0.6, 0.8, 1.2],
    'Sample4': [0.2, 1.1, 1.2, 0.2, 0.5, 0.7, 1.1],
    'Sample5': [1.3, 0.3, 0.4, 0.6, 0.9, 1.1, 0.7],
    'Sample6': [1.4, 0.4, 0.5, 0.7, 1.0, 1.2, 0.8],
    'Sample7': [0.3, 1.9, 2.0, 0.4, 0.7, 0.9, 1.3],
    'Sample8': [0.4, 1.2, 1.3, 0.3, 0.6, 0.8, 1.2]
}
data = pd.DataFrame(data)
# This works
get_heatmap(data,
           motifs = True,
           feature_set=[
               'terminal1',
               #'terminal2',
               #'terminal3'
                       ])
# This works
get_heatmap(data,
           motifs = True,
           feature_set=[
               #'terminal1',
               'terminal2',
               #'terminal3'
                       ])
# This fails
get_heatmap(data,
           motifs = True,
           feature_set=[
               'terminal1',
               'terminal2',
               #'terminal3'
                       ])
# This works
get_heatmap(data,
           motifs = True,
           feature_set=[
               'terminal1',
               #'terminal2',
               'terminal3'
                       ])
# This works
get_heatmap(data,
           motifs = True,
           feature_set=[
               #'terminal1',
               'terminal2',
               'terminal3'
                       ])
# This fails
get_heatmap(data,
           motifs = True,
           feature_set=[
               'terminal1',
               'terminal2',
               'terminal3'
                       ])
mattias-erhardsson commented 3 weeks ago

Done with commit #db6c19d12088e44a68df6c6335b10e6972ac2c91

Bribak commented 3 weeks ago

Thanks! So the logic worked fine actually; the issue was with the grouping of features, where nested lists were not handled properly with terminal1+terminal2 combination

Fixed with f6ff830 (I've tried all combinations with that example and everything worked)