NBISweden / IgDiscover-legacy

Analyze antibody repertoires and discover new V genes from high-throughput sequencing reads
https://www.igdiscover.se
MIT License
17 stars 10 forks source link

IgDiscover crashes with pandas 2.0.0 #121

Closed ressy closed 1 year ago

ressy commented 1 year ago

pandas 2.0.0 just came out a few days ago, and it looks like there must be some compatibility-breaking changes compared with the previous major version; IgDiscover crashes with pandas 2.0.0 but works with 1.5.3. I don't think you'll catch this with the current tests because it's pinned at 1.5.3 in conda-linux-64.lock.

A minimal reproducible example with the test dataset starting from scratch (just with mamba instead of conda, since conda uses an obscene amount of RAM for dependency resolution):

if [ ! -e mambaforge ]; then
        wget https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Linux-x86_64.sh
        bash Mambaforge-Linux-x86_64.sh -b
        ~/mambaforge/bin/conda init
        source ~/.bashrc
        conda --version
        conda config --add channels defaults
        conda config --add channels bioconda
        conda config --add channels conda-forge
        mamba create -y -n igdiscover igdiscover
fi
conda activate igdiscover
conda list | grep -E 'pandas|igdiscover'
if [ ! -f igdiscover-testdata-0.5.tar.gz ]; then
        wget https://bitbucket.org/igdiscover/testdata/downloads/igdiscover-testdata-0.5.tar.gz
        tar xvf igdiscover-testdata-0.5.tar.gz
fi
igdiscover init --db igdiscover-testdata/database/ --reads igdiscover-testdata/reads.1.fastq.gz discovertest
cd discovertest && igdiscover run

I get this IgDiscover and pandas:

igdiscover                0.15.1             pyhdfd78af_0    bioconda
pandas                    2.0.0           py310h9b08913_0    conda-forge

And it fails on the rule plot_errorhistograms with:

INFO: Wrote 'iteration-01/errorhistograms.pdf'
Traceback (most recent call last):
  File "/home/test/mambaforge/envs/igdiscover/bin/igdiscover", line 10, in <module>
    sys.exit(main())
  File "/home/test/mambaforge/envs/igdiscover/lib/python3.10/site-packages/igdiscover/__main__.py", line 92, in main
    to_run()
  File "/home/test/mambaforge/envs/igdiscover/lib/python3.10/site-packages/igdiscover/__main__.py", line 90, in <lambda>
    to_run = lambda: module.main(args)
  File "/home/test/mambaforge/envs/igdiscover/lib/python3.10/site-packages/igdiscover/cli/errorplot.py", line 98, in main
    g = sns.catplot(x='v_call', y='V_SHM', kind='boxen', order=genes, data=table,
  File "/home/test/mambaforge/envs/igdiscover/lib/python3.10/site-packages/seaborn/_decorators.py", line 46, in inner_f
    return f(**kwargs)
  File "/home/test/mambaforge/envs/igdiscover/lib/python3.10/site-packages/seaborn/categorical.py", line 3847, in catplot
    g.map_dataframe(plot_func, x=x, y=y, hue=hue, **plot_kws)
  File "/home/test/mambaforge/envs/igdiscover/lib/python3.10/site-packages/seaborn/axisgrid.py", line 777, in map_dataframe
    self._facet_plot(func, ax, args, kwargs)
  File "/home/test/mambaforge/envs/igdiscover/lib/python3.10/site-packages/seaborn/axisgrid.py", line 806, in _facet_plot
    func(*plot_args, **plot_kwargs)
  File "/home/test/mambaforge/envs/igdiscover/lib/python3.10/site-packages/seaborn/_decorators.py", line 46, in inner_f
    return f(**kwargs)
  File "/home/test/mambaforge/envs/igdiscover/lib/python3.10/site-packages/seaborn/categorical.py", line 2642, in boxenplot
    plotter.plot(ax, kwargs)
  File "/home/test/mambaforge/envs/igdiscover/lib/python3.10/site-packages/seaborn/categorical.py", line 2065, in plot
    self.draw_letter_value_plot(ax, boxplot_kws)
  File "/home/test/mambaforge/envs/igdiscover/lib/python3.10/site-packages/seaborn/categorical.py", line 2024, in draw_letter_value_plot
    self._lvplot(box_data,
  File "/home/test/mambaforge/envs/igdiscover/lib/python3.10/site-packages/seaborn/categorical.py", line 1918, in _lvplot
    box_ends, k = self._lv_box_ends(box_data)
  File "/home/test/mambaforge/envs/igdiscover/lib/python3.10/site-packages/seaborn/categorical.py", line 1847, in _lv_box_ends
    with pd.option_context('mode.use_inf_as_null', True):
  File "/home/test/mambaforge/envs/igdiscover/lib/python3.10/site-packages/pandas/_config/config.py", line 441, in __enter__
    self.undo = [(pat, _get_option(pat, silent=True)) for pat, val in self.ops]
  File "/home/test/mambaforge/envs/igdiscover/lib/python3.10/site-packages/pandas/_config/config.py", line 441, in <listcomp>
    self.undo = [(pat, _get_option(pat, silent=True)) for pat, val in self.ops]
  File "/home/test/mambaforge/envs/igdiscover/lib/python3.10/site-packages/pandas/_config/config.py", line 135, in _get_option
    key = _get_single_key(pat, silent)
  File "/home/test/mambaforge/envs/igdiscover/lib/python3.10/site-packages/pandas/_config/config.py", line 121, in _get_single_key
    raise OptionError(f"No such keys(s): {repr(pat)}")
pandas._config.config.OptionError: No such keys(s): 'mode.use_inf_as_null'
[Thu Apr  6 11:19:13 2023]
Error in rule plot_errorhistograms:
    jobid: 2
    output: iteration-01/errorhistograms.pdf, iteration-01/v-shm-distributions.pdf
    shell:
        igdiscover errorplot --multi=iteration-01/errorhistograms.pdf --boxplot=iteration-01/v-shm-distributions.pdf iteration-01/filtered.tsv.gz
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job plot_errorhistograms since they might be corrupted:
iteration-01/errorhistograms.pdf
Job failed, going on with independent jobs.
Exiting because a job execution failed. Look above for error message
Complete log: /home/test/discovertest/.snakemake/log/2023-04-06T111908.380212.snakemake.log
Total CPU time: 0h 0.18m
ERROR: 

If I instead force it to pandas 1 like this:

        mamba create -y -n igdiscover igdiscover pandas=1

I get these packages:

igdiscover                0.15.1             pyhdfd78af_0    bioconda
pandas                    1.5.3           py310h9b08913_1    conda-forge

...and it all works as expected. Maybe for the time being pandas should be pinned at version 1 in environment.yml? (Unless it's an easy update for new pandas; I just have no idea. Looks to me like it might be seaborn's fault based on that mode.use_inf_as_null thing in the traceback.)

marcelm commented 1 year ago

Thanks a lot for reporting! I have opened https://github.com/bioconda/bioconda-recipes/pull/40289 for the time being until I have time to investigate this further.

marcelm commented 1 year ago

BTW, you can now set Conda to use the Mamba solver with these commands:

conda install -n base conda-libmamba-solver
conda config --set solver libmamba

(I hope they make it the default soon.)

ressy commented 1 year ago

Oh wow, I had no idea conda had yielded to mamba to that extent. Thanks!

Also pretty sure it's seaborn's problem here. I'm not familiar with seaborn but for what it's worth this crashes with pandas 2.0.0 but works with 1.5.3:

import seaborn
import pandas
table = pandas.DataFrame(data = {
    "x": ["A", "A", "A", "B", "B", "B"],
    "y": [5, 7, 6, 2, 3, 2.5]})
seaborn.boxenplot(data=table, x="x", y="y")
marcelm commented 1 year ago

Oh wow, I had no idea conda had yielded to mamba to that extent. Thanks!

I think the intention by the Mamba folks was from the beginning to get the fast solver back into Conda. Good to see it happening.

Also pretty sure it's seaborn's problem here. I'm not familiar with seaborn but for what it's worth this crashes with pandas 2.0.0 but works with 1.5.3: [...]

Ah, thanks for the reproducer. I’ll see whether I can get IgDiscover to use the most recent seaborn version. I saw some commits regarding pandas 2 compatibility.

marcelm commented 1 year ago

The Bioconda package is now updated and I also changed the Conda lock file to use Pandas 2.0.0, thanks again for reporting.