YosefLab / Cassiopeia

A Package for Cas9-Enabled Single Cell Lineage Tracing Tree Reconstruction
https://cassiopeia-lineage.readthedocs.io/en/latest/
MIT License
75 stars 24 forks source link

Unable to read pickled files due to renaming of module? #203

Closed ekmolloy closed 1 year ago

ekmolloy commented 1 year ago

Thanks for the well-documented software package!

I just wanted to check in about reading the data sets published on Zenado (https://zenodo.org/record/3706351) with pickle. I am getting the error message below, potentially because the module was renamed from Cassiopeia to cassiopeia at some point. I was wondering which version of the Cassiopeia code on GitHub I should be using to read this data. Thank you!

>>> import cassiopeia
dir_path = /nfshomes/ekmolloy/.local/lib/python3.10/site-packages/cassiopeia/tools/fitness_estimator
>>> import pickle
>>> with open('true_network_characters_20_run_9.pkl', 'rb') as f:
...     x = pickle.load(f)
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
ModuleNotFoundError: No module named 'Cassiopeia'
mattjones315 commented 1 year ago

Hello @ekmolloy,

Thanks for posting this issue, and apologies for the nuisance here. The present version of Cassiopeia is unfortunately not backwards compatible with the data structure used int our 2020 study. However, you can download a workable version of Cassiopeia from our released code for the 2022 KP-Tracer study here. If this does not work, I'll dig up a workable version from our commit history (before we started releasing versions).

Once you have downloaded these older version of Cassiopeia, you should be able to read in the pickled tree files. Then, depending on if you need features available in the newer Cassiopeia version or not, you can save Newick format trees of these pickled objects which will be compatible with the most recent Cassiopeia release.

As I'm anticipating that once you have an older version installed it will be straightforward to proceed, I'm going to close this issue. Hope this helps and please don't hesitate to reach out or reopen this issue if you run into additional issues.

Thanks, Matt

ekmolloy commented 1 year ago

Hi Matt,

Thank you so much for your response - I think I am pretty close to be able to use the data. I was wondering if you knew what version of numpy you were using when you pickled the data or if there is a way for me to check this in the pickle files?

Here is a list of commands I have tried in case it is helpful:

1. I use the following commands to download the older version of Cassiopeia.

module load  Python3/3.8.15 
pip3 install Cython --user
pip3 install wheel --user

git clone https://github.com/YosefLab/Cassiopeia
cd Cassiopeia/
git checkout 93ef75c

python3 setup.py build
python3 setup.py build_ext --inplace
python3 setup.py bdist_wheel
python3 -m pip install . --user

2. I try to load the data:

Python 3.8.15 (default, Feb 28 2023, 10:06:17) 
[GCC 8.5.0 20210514 (Red Hat 8.5.0-16)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> with open('true_network_characters_20_run_9.pkl', 'rb') as f:
...     x = pickle.load(f)
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/nfshomes/ekmolloy/.local/lib/python3.8/site-packages/Cassiopeia/TreeSolver/__init__.py", line 4, in <module>
    from Cassiopeia.TreeSolver.Cassiopeia_Tree import Cassiopeia_Tree
  File "/nfshomes/ekmolloy/.local/lib/python3.8/site-packages/Cassiopeia/TreeSolver/Cassiopeia_Tree.py", line 4, in <module>
    from Cassiopeia.TreeSolver.data_pipeline import convert_network_to_newick_format
  File "/nfshomes/ekmolloy/.local/lib/python3.8/site-packages/Cassiopeia/TreeSolver/data_pipeline.py", line 5, in <module>
    import pandas as pd
  File "/nfshomes/ekmolloy/.local/lib/python3.8/site-packages/pandas/__init__.py", line 32, in <module>
    from pandas._libs import hashtable as _hashtable, lib as _lib, tslib as _tslib
  File "/nfshomes/ekmolloy/.local/lib/python3.8/site-packages/pandas/_libs/__init__.py", line 3, in <module>
    from .tslibs import (
  File "/nfshomes/ekmolloy/.local/lib/python3.8/site-packages/pandas/_libs/tslibs/__init__.py", line 3, in <module>
    from .conversion import localize_pydatetime, normalize_date
  File "__init__.pxd", line 918, in init pandas._libs.tslibs.conversion
ValueError: numpy.ufunc size changed, may indicate binary incompatibility. Expected 216 from C header, got 192 from PyObject

3. Based on this stack exchange post (https://stackoverflow.com/questions/60323366/valueerror-numpy-ufunc-size-changed-may-indicate-binary-incompatibility-expec), I looks like the issue could be numpy. I tried installing a different version of numpy:

pip3 uninstall numpy
pip3 install numpy==1.16.6

But this gave the following error message:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
cassiopeia 1.0.0 requires numpy<1.15,>1.0, but you have numpy 1.16.6 which is incompatible.

This is why I am interested in the numpy version (might try a few different ones later).

Thanks! Erin

ekmolloy commented 1 year ago

It's also possible I am somewhat confused about install instructions on https://github.com/mattjones315/KPTracer-release/tree/main/cassiopeia-kp (because was linking back to https://github.com/YosefLab/Cassiopeia.git) so I will also look at it more carefully - thanks!