Closed fubin1999 closed 5 months ago
Here is the dataset I used. The first three samples are group1
and the rest three are group2
.
prepared.csv
Thanks for bringing this up! I think the issue might be that the edge attribute is 'diffs' but 'diff' is indexed here (we rarely do N-glycan biosynthetic network, which is why this didn't come up I suppose). I'll have a look at whether this fixes the issue.
Thanks for your fast response. BTW, I find this function really useful and would like to know how it works. I noticed that it’s a new function of glycowork but couldn’t find any paper describing it. Will there be an upcoming preprint?
This has been solved in 3b7fd64 The diffs/diff thing was indeed an issue. But I then encountered another issue: capacity bottlenecks with N-glycans, where there are many unobserved intermediate structures, resulting in everything being low variance-filtered. Minimum default capacity is now class-specific to avoid this.
Since we already pushed 1.3 to PyPI yesterday, this will not be part of 1.3 (it's currently on the dev branch). Might be 1.3.1 but it's honestly more likely that it'll come with 1.4 (autumn-ish?)
Glad that you find this functionality useful! It's still in a sort of beta (some fine-tuning + expansion is expected/required). It will be part of a preprint at some point (maybe/hopefully with 1.4?), but it's always a question of prioritization/time constraints:-)
Hi! I encountered an IndexError using
get_differential_biosynthesis
on my own dataset.prepared_sub
is a subset of my dataset containing only 6 samples for debugging. I've formatted this DataFrame with IUPAC strings as the first "glycan" column, and the following columns samples.Full traceback:
More information for reference:
glycowork
version: v1.2.0I did a little debugging with PyCharm and found that
abundances[k]
withk
being 62 (line 646 in biosynthesis.py) triggered the error. Note thatprepared_sub
had only 62 rows.