COMBINE-lab / simpleaf

A rust framework to make using alevin-fry even simpler
BSD 3-Clause "New" or "Revised" License
42 stars 3 forks source link

simpleaf index issues around pyroe #80

Closed rpolicastro closed 1 year ago

rpolicastro commented 1 year ago

Hi!

I was encountering an error when running simpleaf index with a custom genome, so tried making a reprex. I encountered a different error with this reprex so figured we could work through this first.

relevant versions:

simpleaf 0.12.0
piscem 0.6.0
pyroe 0.9.0

I'll use the ENSEMBL bakers yeast genome for the reprex.

wget "https://ftp.ensembl.org/pub/release-109/fasta/saccharomyces_cerevisiae/dna/Saccharomyces_cerevisiae.R64-1-1.dna_rm.toplevel.fa.gz"
wget "https://ftp.ensembl.org/pub/release-109/gtf/saccharomyces_cerevisiae/Saccharomyces_cerevisiae.R64-1-1.109.gtf.gz"

Preparing to run the indexing step.

export ALEVIN_FRY_HOME=/workdir/simpleaf_workdir
simpleaf set-paths

Running the indexing.

simpleaf index \
  -o final_index \
  -t 8 \
  --overwrite \
  --ref-type spliced+intronic \
  -f Saccharomyces_cerevisiae.R64-1-1.dna_rm.toplevel.fa.gz \
  -g Saccharomyces_cerevisiae.R64-1-1.109.gtf.gz \
  --dedup \
  --rlen 90 \
  --use-piscem

Resulting error.

Traceback (most recent call last):
  File "/software/miniconda3/envs/simpleaf/bin/pyroe", line 254, in <module>
    make_splici_txome(
  File "/software/miniconda3/envs/simpleaf/lib/python3.10/site-packages/pyroe/make_txome.py", line 638, in make_splici_txome
    introns = gr.features.introns(by="transcript")
  File "/software/miniconda3/envs/simpleaf/lib/python3.10/site-packages/pyranges/genomicfeatures.py", line 254, in introns
    result = pyrange_apply(_introns2, by_gr, exons, **kwargs)
  File "/software/miniconda3/envs/simpleaf/lib/python3.10/site-packages/pyranges/multithreaded.py", line 293, in pyrange_apply
    result = call_f(function, nparams, df, odf, kwargs)
  File "/software/miniconda3/envs/simpleaf/lib/python3.10/site-packages/pyranges/multithreaded.py", line 23, in call_f
    return f.remote(df, odf, **kwargs)
  File "/software/miniconda3/envs/simpleaf/lib/python3.10/site-packages/pyranges/genomicfeatures.py", line 607, in _introns2
    introns.Feature.cat.add_categories(["intron"], inplace=True)
  File "/software/miniconda3/envs/simpleaf/lib/python3.10/site-packages/pandas/core/accessor.py", line 112, in f
    return self._delegate_method(name, *args, **kwargs)
  File "/software/miniconda3/envs/simpleaf/lib/python3.10/site-packages/pandas/core/arrays/categorical.py", line 2475, in _delegate_method
    res = method(*args, **kwargs)
TypeError: Categorical.add_categories() got an unexpected keyword argument 'inplace'
====
Error: pyroe failed to return succesfully ExitStatus(unix_wait_status(256))

Cheers, Bob

rob-p commented 1 year ago

Thanks @rpolicastro!

@DongzeHE, seems like it’s an issue with pyranges handling the gtf. Any thoughts?

rob-p commented 1 year ago

@rpolicastro,

Could you also list your pyroe version?

Thanks! Rob

rpolicastro commented 1 year ago

0.9.0, so it should be the latest release version as of today.

Cheers!

rpolicastro commented 1 year ago

As a semi-related note, sometimes ENSEMBL GTFs have malformed entries that cause problems with e.g. the Salmon workflow. For those cases I'll run them through AGAT first to fix errors.

I tried doing the same for this GTF file but the error was the same as the one above. This was run with agat v1.0.0.

agat_convert_sp_gff2gtf.pl \
  -i Saccharomyces_cerevisiae.R64-1-1.109.gtf.gz \
  --gtf_version 2.5 \
  -o cleaned_Saccharomyces_cerevisiae.R64-1-1.109.gtf.gz

# This version of AGAT didn't seem to actually zip the file, so I manually did so.
mv cleaned_Saccharomyces_cerevisiae.R64-1-1.109.gtf.gz cleaned_Saccharomyces_cerevisiae.R64-1-1.109.gtf
gzip cleaned_Saccharomyces_cerevisiae.R64-1-1.109.gtf
rob-p commented 1 year ago

Hi @rpolicastro,

Ok, I'm trying to reproduce with the following, but so far it seemed to work:

$ conda install pyroe
$ pyroe -v 
0.9.0
$ wget "https://ftp.ensembl.org/pub/release-109/fasta/saccharomyces_cerevisiae/dna/Saccharomyces_cerevisiae.R64-1-1.dna_rm.toplevel.fa.gz"
$ wget "https://ftp.ensembl.org/pub/release-109/gtf/saccharomyces_cerevisiae/Saccharomyces_cerevisiae.R64-1-1.109.gtf.gz"
$ gunzip Saccharomyces_cerevisiae.R64-1-1.dna_rm.toplevel.fa.gz
$ mkdir OUTDIR
$ pyroe make-splici Saccharomyces_cerevisiae.R64-1-1.dna_rm.toplevel.fa Saccharomyces_cerevisiae.R64-1-1.109.gtf.gz 90 OUTDIR

Note that the genome needs to be decompressed for pyroe. Also, I noticed some weird behavior with the alias make-spliced+intronic that we should fix upstream (but simpleaf will be using the make-splici alias anyway). Can you confirm that you get the same problem with the above commands?

Thanks! Rob

rpolicastro commented 1 year ago

This is strange, I get the same error after after using a fresh install of pyroe 0.9.0 and decompressing the assembly.

The mamba environment.

mamba create -n pyroe -c conda-forge -c bioconda pyroe==0.9.0
mamba activate pyroe

Running pyroe directly.

wget "https://ftp.ensembl.org/pub/release-109/gtf/saccharomyces_cerevisiae/Saccharomyces_cerevisiae.R64-1-1.109.gtf.gz"
wget "https://ftp.ensembl.org/pub/release-109/fasta/saccharomyces_cerevisiae/dna/Saccharomyces_cerevisiae.R64-1-1.dna_rm.toplevel.fa.gz"

gunzip Saccharomyces_cerevisiae.R64-1-1.dna_rm.toplevel.fa.gz

mkdir -p OUTDIR
pyroe make-splici Saccharomyces_cerevisiae.R64-1-1.dna_rm.toplevel.fa Saccharomyces_cerevisiae.R64-1-1.109.gtf.gz 90 OUTDIR

The error (same as last time).

Traceback (most recent call last):
  File "/software/miniconda3/envs/pyroe/bin/pyroe", line 254, in <module>
    make_splici_txome(
  File "/software/miniconda3/envs/pyroe/lib/python3.10/site-packages/pyroe/make_txome.py", line 638, in make_splici_txome
    introns = gr.features.introns(by="transcript")
  File "/software/miniconda3/envs/pyroe/lib/python3.10/site-packages/pyranges/genomicfeatures.py", line 254, in introns
    result = pyrange_apply(_introns2, by_gr, exons, **kwargs)
  File "/software/miniconda3/envs/pyroe/lib/python3.10/site-packages/pyranges/multithreaded.py", line 293, in pyrange_apply
    result = call_f(function, nparams, df, odf, kwargs)
  File "/software/miniconda3/envs/pyroe/lib/python3.10/site-packages/pyranges/multithreaded.py", line 23, in call_f
    return f.remote(df, odf, **kwargs)
  File "/software/miniconda3/envs/pyroe/lib/python3.10/site-packages/pyranges/genomicfeatures.py", line 607, in _introns2
    introns.Feature.cat.add_categories(["intron"], inplace=True)
  File "/software/miniconda3/envs/pyroe/lib/python3.10/site-packages/pandas/core/accessor.py", line 112, in f
    return self._delegate_method(name, *args, **kwargs)
  File "/software/miniconda3/envs/pyroe/lib/python3.10/site-packages/pandas/core/arrays/categorical.py", line 2475, in _delegate_method
    res = method(*args, **kwargs)
TypeError: Categorical.add_categories() got an unexpected keyword argument 'inplace'
rob-p commented 1 year ago

Sighhhh.... that's gonna make things tough. At this point I'm guessing that maybe the issue has to do with the version of pyranges being pulled in — that is the library being used for GTF parsing and has been the source of issues in the past. This is what I get:

❯ conda list | rg "pyranges"
pyranges                  0.0.120            pyh7cba7a3_0    bioconda
rob-p commented 1 year ago

Hi @rpolicastro,

Ok, I was able to reproduce this. Right now the key differences seem to be that in the env that reproduces it I am using OSX (rather than linux) and the base install is python 3.10 rather than 3.9.x. I'm thinking the latter one is to blame. @DongzeHE — we should figure out what the problem is upstream here, as we definitely need python 3.10 (and probably 3.11) support. Sigh ...

However, I'll note that before the traceback, I get this message:

WARNING:root: Found records with missing gene_id/gene_name field. These records are reported in OUTDIR/missing_gene_id_or_name_records.gtf. Imputed 10504 missing gene_name using gene_id.
WARNING:root: A clean GTF file with all issues fixed is generated at OUTDIR/clean_gtf.gtf. If needed, please rerun using this clean GTF file.

If I follow the suggestion and then run:

pyroe make-splici Saccharomyces_cerevisiae.R64-1-1.dna_rm.toplevel.fa OUTDIR/clean_gtf.gtf 90 OUTDIR

execution completes successfully. Is the same true for you?

--Rob

rpolicastro commented 1 year ago

Alright, so decompressing my custom genome assembly resolved the original issue I had; the one that prompted this issue and reprex. In hindsight the error message (regarding invalid UTF-8 characters) made sense, since it was trying to read the archive as plain text. It might be worth it to mention in the simpleaf index --help that the assembly should be decompressed, and to perhaps explicitly check for a compressed file format so there can be a more graceful error.

Now, back to the strangeness of this error. I ran the same workflow for the yeast genome that worked for my custom assembly and got the same error.

As I was typing this I saw the message that you were able to reproduce this. Let me try both rerunning with that cleaned GTF file, and also checking whether downgrading python to 3.9 (for the current 3.10) works.

rpolicastro commented 1 year ago

A few follow-up notes:

rob-p commented 1 year ago

Ok —— so we are making some progress. Some notes:

(1) Aside from pyroe, we should explicitly check in simpleaf if the genome is compressed and, if so, simply de-compress it before passing to pyroe (we can clean the decompressed version after successful extraction).

(2) It's good the clean_gtf.gtf works, that means the issue is definitely related to pyranges' parsing of the original GTF file — though erroring out with a backtrace is not optimal behavior ;P.

(3) Very interesting. I wonder what's different about my environment where it just works? We'll have to investigate that further. It seems it's python 3.9.7, but I really doubt that difference is the key.

Here's one thought / suggestion — do you think a pip install would be any different?

rpolicastro commented 1 year ago

Adding to the strangeness

In case you wanted to compare versions here's everything in my python 3.9.7 environment created via mamba create -n simpleaf -c conda-forge -c bioconda pyroe==0.9.0 simpleaf==0.12.0 piscem==0.6.0 python==3.9.7.

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
alevin-fry                0.8.1                h9f5acd7_0    bioconda
anndata                   0.9.1              pyhd8ed1ab_0    conda-forge
bedtools                  2.30.0               h468198e_3    bioconda
biopython                 1.81             py39h72bdee0_0    conda-forge
boost-cpp                 1.74.0               h6cacc03_7    conda-forge
brotli                    1.0.9                h166bdaf_8    conda-forge
brotli-bin                1.0.9                h166bdaf_8    conda-forge
brotlipy                  0.7.0           py39hb9d737c_1005    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
c-ares                    1.18.1               h7f98852_0    conda-forge
ca-certificates           2022.12.7            ha878542_0    conda-forge
cached-property           1.5.2                hd8ed1ab_1    conda-forge
cached_property           1.5.2              pyha770c72_1    conda-forge
certifi                   2022.12.7          pyhd8ed1ab_0    conda-forge
cffi                      1.15.1           py39he91dace_3    conda-forge
charset-normalizer        3.1.0              pyhd8ed1ab_0    conda-forge
colorama                  0.4.6              pyhd8ed1ab_0    conda-forge
contourpy                 1.0.7            py39h4b4f3f3_0    conda-forge
cryptography              40.0.2           py39h079d5ae_0    conda-forge
cycler                    0.11.0             pyhd8ed1ab_0    conda-forge
fonttools                 4.39.3           py39h72bdee0_0    conda-forge
freetype                  2.12.1               hca18f0e_1    conda-forge
h5py                      3.8.0           nompi_py39h89bf01e_101    conda-forge
hdf5                      1.14.0          nompi_hb72d44e_103    conda-forge
icu                       69.1                 h9c3ff4c_0    conda-forge
idna                      3.4                pyhd8ed1ab_0    conda-forge
importlib-metadata        6.4.1              pyha770c72_0    conda-forge
importlib-resources       5.12.0             pyhd8ed1ab_0    conda-forge
importlib_metadata        6.4.1                hd8ed1ab_0    conda-forge
importlib_resources       5.12.0             pyhd8ed1ab_0    conda-forge
joblib                    1.2.0              pyhd8ed1ab_0    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
kiwisolver                1.4.4            py39hf939315_1    conda-forge
krb5                      1.20.1               h81ceb04_0    conda-forge
lcms2                     2.15                 haa2dc70_1    conda-forge
ld_impl_linux-64          2.40                 h41732ed_0    conda-forge
lerc                      4.0.0                h27087fc_0    conda-forge
libaec                    1.0.6                hcb278e6_1    conda-forge
libblas                   3.9.0           16_linux64_openblas    conda-forge
libbrotlicommon           1.0.9                h166bdaf_8    conda-forge
libbrotlidec              1.0.9                h166bdaf_8    conda-forge
libbrotlienc              1.0.9                h166bdaf_8    conda-forge
libcblas                  3.9.0           16_linux64_openblas    conda-forge
libcurl                   8.0.1                h588be90_0    conda-forge
libdeflate                1.18                 h0b41bf4_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 h516909a_1    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 12.2.0              h65d4601_19    conda-forge
libgfortran-ng            12.2.0              h69a702a_19    conda-forge
libgfortran5              12.2.0              h337968e_19    conda-forge
libgomp                   12.2.0              h65d4601_19    conda-forge
libhwloc                  2.8.0                h32351e8_1    conda-forge
libiconv                  1.17                 h166bdaf_0    conda-forge
libjemalloc               5.3.0                hcb278e6_0    conda-forge
libjpeg-turbo             2.1.5.1              h0b41bf4_0    conda-forge
liblapack                 3.9.0           16_linux64_openblas    conda-forge
libllvm11                 11.1.0               he0ac6c6_5    conda-forge
libnghttp2                1.52.0               h61bc06f_0    conda-forge
libopenblas               0.3.21          pthreads_h78a6416_3    conda-forge
libpng                    1.6.39               h753d276_0    conda-forge
libsqlite                 3.40.0               h753d276_0    conda-forge
libssh2                   1.10.0               hf14f497_3    conda-forge
libstdcxx-ng              12.2.0              h46fd767_19    conda-forge
libtiff                   4.5.0                ha587672_6    conda-forge
libwebp-base              1.3.0                h0b41bf4_0    conda-forge
libxcb                    1.13              h7f98852_1004    conda-forge
libxml2                   2.9.14               haae042b_4    conda-forge
libzlib                   1.2.13               h166bdaf_4    conda-forge
llvmlite                  0.39.1           py39h7d9a04d_1    conda-forge
matplotlib-base           3.7.1            py39he190548_0    conda-forge
munkres                   1.1.4              pyh9f0ad1d_0    conda-forge
natsort                   8.3.1              pyhd8ed1ab_0    conda-forge
ncls                      0.0.66           py39hbf8eff0_0    bioconda
ncurses                   6.3                  h27087fc_1    conda-forge
networkx                  3.1                pyhd8ed1ab_0    conda-forge
numba                     0.56.4           py39h71a7301_1    conda-forge
numpy                     1.23.5           py39h3d75532_0    conda-forge
openjpeg                  2.5.0                hfec8fc6_2    conda-forge
openssl                   3.1.0                h0b41bf4_0    conda-forge
packaging                 23.1               pyhd8ed1ab_0    conda-forge
pandas                    2.0.0            py39h2ad29b5_0    conda-forge
patsy                     0.5.3              pyhd8ed1ab_0    conda-forge
pillow                    9.5.0            py39h7207d5c_0    conda-forge
pip                       23.1               pyhd8ed1ab_0    conda-forge
piscem                    0.6.0                h52b76fa_0    bioconda
platformdirs              3.2.0              pyhd8ed1ab_0    conda-forge
pooch                     1.7.0              pyha770c72_3    conda-forge
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
pycparser                 2.21               pyhd8ed1ab_0    conda-forge
pynndescent               0.5.8              pyh1a96a4e_0    conda-forge
pyopenssl                 23.1.1             pyhd8ed1ab_0    conda-forge
pyparsing                 3.0.9              pyhd8ed1ab_0    conda-forge
pyranges                  0.0.120            pyh7cba7a3_0    bioconda
pyrle                     0.0.35           py39hbf8eff0_1    bioconda
pyroe                     0.9.0              pyhdfd78af_0    bioconda
pysocks                   1.7.1              pyha2e5f31_6    conda-forge
python                    3.9.7           hf930737_3_cpython    conda-forge
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python-tzdata             2023.3             pyhd8ed1ab_0    conda-forge
python_abi                3.9                      3_cp39    conda-forge
pytz                      2023.3             pyhd8ed1ab_0    conda-forge
readline                  8.2                  h8228510_1    conda-forge
requests                  2.28.2             pyhd8ed1ab_1    conda-forge
salmon                    1.10.1               h7e5ed60_0    bioconda
scanpy                    1.9.3              pyhd8ed1ab_0    conda-forge
scikit-learn              1.2.2            py39hd189fd4_1    conda-forge
scipy                     1.10.1           py39h7360e5f_0    conda-forge
seaborn                   0.12.2               hd8ed1ab_0    conda-forge
seaborn-base              0.12.2             pyhd8ed1ab_0    conda-forge
session-info              1.0.0              pyhd8ed1ab_0    conda-forge
setuptools                67.6.1             pyhd8ed1ab_0    conda-forge
simpleaf                  0.12.0               h9f5acd7_0    bioconda
six                       1.16.0             pyh6c4a22f_0    conda-forge
sorted_nearest            0.0.37           py39hbf8eff0_0    bioconda
sqlite                    3.40.0               h4ff8645_0    conda-forge
statsmodels               0.13.5           py39h2ae25f5_2    conda-forge
stdlib-list               0.8.0              pyhd8ed1ab_0    conda-forge
tabulate                  0.9.0              pyhd8ed1ab_1    conda-forge
tbb                       2021.7.0             h924138e_1    conda-forge
threadpoolctl             3.1.0              pyh8a188c0_0    conda-forge
tk                        8.6.12               h27826a3_0    conda-forge
tqdm                      4.65.0             pyhd8ed1ab_1    conda-forge
typing-extensions         4.5.0                hd8ed1ab_0    conda-forge
typing_extensions         4.5.0              pyha770c72_0    conda-forge
tzdata                    2023c                h71feb2d_0    conda-forge
umap-learn                0.5.3            py39hf3d152e_0    conda-forge
unicodedata2              15.0.0           py39hb9d737c_0    conda-forge
urllib3                   1.26.15            pyhd8ed1ab_0    conda-forge
wheel                     0.40.0             pyhd8ed1ab_0    conda-forge
xorg-libxau               1.0.9                h7f98852_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
zipp                      3.15.0             pyhd8ed1ab_0    conda-forge
zlib                      1.2.13               h166bdaf_4    conda-forge
zstd                      1.5.2                h3eb15da_6    conda-forge
rob-p commented 1 year ago

I am now very curious about the local version that works without the clean gtf! I am also curious what it is about the ensemble gtf that breaks pyranges!

rpolicastro commented 1 year ago

You probably hit on some holy combination of versions to counter the unholy mess that is ENSEMBL GTFs 😅

rob-p commented 1 year ago

So here is what I have in my pip3 (python 3.9.7) install (investigated via pipdeptree):

pyroe==0.9.0
  - biopython [required: >=1.77, installed: 1.79]
    - numpy [required: Any, installed: 1.21.5]
  - packaging [required: >=21.0, installed: 21.3]
    - pyparsing [required: >=2.0.2,!=3.0.5, installed: 3.0.7]
  - pandas [required: >=1.3.0, installed: 1.4.1]
    - numpy [required: >=1.18.5, installed: 1.21.5]
    - python-dateutil [required: >=2.8.1, installed: 2.8.2]
      - six [required: >=1.5, installed: 1.16.0]
    - pytz [required: >=2020.1, installed: 2022.1]
  - pyranges [required: >=0.0.120, installed: 0.0.120]
    - cython [required: Any, installed: 0.29.28]
    - natsort [required: Any, installed: 8.1.0]
    - ncls [required: >=0.0.63, installed: 0.0.64]
      - numpy [required: Any, installed: 1.21.5]
    - pandas [required: Any, installed: 1.4.1]
      - numpy [required: >=1.18.5, installed: 1.21.5]
      - python-dateutil [required: >=2.8.1, installed: 2.8.2]
        - six [required: >=1.5, installed: 1.16.0]
      - pytz [required: >=2020.1, installed: 2022.1]
    - pyrle [required: Any, installed: 0.0.34]
      - cython [required: Any, installed: 0.29.28]
      - natsort [required: Any, installed: 8.1.0]
      - numpy [required: Any, installed: 1.21.5]
      - pandas [required: Any, installed: 1.4.1]
        - numpy [required: >=1.18.5, installed: 1.21.5]
        - python-dateutil [required: >=2.8.1, installed: 2.8.2]
          - six [required: >=1.5, installed: 1.16.0]
        - pytz [required: >=2020.1, installed: 2022.1]
      - tabulate [required: Any, installed: 0.8.9]
    - sorted-nearest [required: >=0.0.33, installed: 0.0.33]
      - cython [required: Any, installed: 0.29.28]
      - numpy [required: Any, installed: 1.21.5]
    - tabulate [required: Any, installed: 0.8.9]
  - scanpy [required: >=1.8.2, installed: 1.8.2]
    - anndata [required: >=0.7.4, installed: 0.8.0]
      - h5py [required: >=3, installed: 3.6.0]
        - numpy [required: >=1.14.5, installed: 1.21.5]
      - natsort [required: Any, installed: 8.1.0]
      - numpy [required: >=1.16.5, installed: 1.21.5]
      - packaging [required: >=20, installed: 21.3]
        - pyparsing [required: >=2.0.2,!=3.0.5, installed: 3.0.7]
      - pandas [required: >=1.1.1, installed: 1.4.1]
        - numpy [required: >=1.18.5, installed: 1.21.5]
        - python-dateutil [required: >=2.8.1, installed: 2.8.2]
          - six [required: >=1.5, installed: 1.16.0]
        - pytz [required: >=2020.1, installed: 2022.1]
      - scipy [required: >1.4, installed: 1.8.0]
        - numpy [required: >=1.17.3,<1.25.0, installed: 1.21.5]
    - h5py [required: >=2.10.0, installed: 3.6.0]
      - numpy [required: >=1.14.5, installed: 1.21.5]
    - joblib [required: Any, installed: 1.1.0]
    - matplotlib [required: >=3.1.2, installed: 3.5.1]
      - cycler [required: >=0.10, installed: 0.11.0]
      - fonttools [required: >=4.22.0, installed: 4.31.2]
      - kiwisolver [required: >=1.0.1, installed: 1.4.2]
      - numpy [required: >=1.17, installed: 1.21.5]
      - packaging [required: >=20.0, installed: 21.3]
        - pyparsing [required: >=2.0.2,!=3.0.5, installed: 3.0.7]
      - pillow [required: >=6.2.0, installed: 9.0.1]
      - pyparsing [required: >=2.2.1, installed: 3.0.7]
      - python-dateutil [required: >=2.7, installed: 2.8.2]
        - six [required: >=1.5, installed: 1.16.0]
    - natsort [required: Any, installed: 8.1.0]
    - networkx [required: >=2.3, installed: 2.7.1]
    - numba [required: >=0.41.0, installed: 0.55.1]
      - llvmlite [required: >=0.38.0rc1,<0.39, installed: 0.38.0]
      - numpy [required: >=1.18,<1.22, installed: 1.21.5]
      - setuptools [required: Any, installed: 63.2.0]
    - numpy [required: >=1.17.0, installed: 1.21.5]
    - packaging [required: Any, installed: 21.3]
      - pyparsing [required: >=2.0.2,!=3.0.5, installed: 3.0.7]
    - pandas [required: >=0.21, installed: 1.4.1]
      - numpy [required: >=1.18.5, installed: 1.21.5]
      - python-dateutil [required: >=2.8.1, installed: 2.8.2]
        - six [required: >=1.5, installed: 1.16.0]
      - pytz [required: >=2020.1, installed: 2022.1]
    - patsy [required: Any, installed: 0.5.2]
      - numpy [required: >=1.4, installed: 1.21.5]
      - six [required: Any, installed: 1.16.0]
    - scikit-learn [required: >=0.22, installed: 1.0.2]
      - joblib [required: >=0.11, installed: 1.1.0]
      - numpy [required: >=1.14.6, installed: 1.21.5]
      - scipy [required: >=1.1.0, installed: 1.8.0]
        - numpy [required: >=1.17.3,<1.25.0, installed: 1.21.5]
      - threadpoolctl [required: >=2.0.0, installed: 3.1.0]
    - scipy [required: >=1.4, installed: 1.8.0]
      - numpy [required: >=1.17.3,<1.25.0, installed: 1.21.5]
    - seaborn [required: Any, installed: 0.11.2]
      - matplotlib [required: >=2.2, installed: 3.5.1]
        - cycler [required: >=0.10, installed: 0.11.0]
        - fonttools [required: >=4.22.0, installed: 4.31.2]
        - kiwisolver [required: >=1.0.1, installed: 1.4.2]
        - numpy [required: >=1.17, installed: 1.21.5]
        - packaging [required: >=20.0, installed: 21.3]
          - pyparsing [required: >=2.0.2,!=3.0.5, installed: 3.0.7]
        - pillow [required: >=6.2.0, installed: 9.0.1]
        - pyparsing [required: >=2.2.1, installed: 3.0.7]
        - python-dateutil [required: >=2.7, installed: 2.8.2]
          - six [required: >=1.5, installed: 1.16.0]
      - numpy [required: >=1.15, installed: 1.21.5]
      - pandas [required: >=0.23, installed: 1.4.1]
        - numpy [required: >=1.18.5, installed: 1.21.5]
        - python-dateutil [required: >=2.8.1, installed: 2.8.2]
          - six [required: >=1.5, installed: 1.16.0]
        - pytz [required: >=2020.1, installed: 2022.1]
      - scipy [required: >=1.0, installed: 1.8.0]
        - numpy [required: >=1.17.3,<1.25.0, installed: 1.21.5]
    - sinfo [required: Any, installed: 0.3.4]
      - stdlib-list [required: Any, installed: 0.8.0]
    - statsmodels [required: >=0.10.0rc2, installed: 0.13.2]
      - numpy [required: >=1.17, installed: 1.21.5]
      - packaging [required: >=21.3, installed: 21.3]
        - pyparsing [required: >=2.0.2,!=3.0.5, installed: 3.0.7]
      - pandas [required: >=0.25, installed: 1.4.1]
        - numpy [required: >=1.18.5, installed: 1.21.5]
        - python-dateutil [required: >=2.8.1, installed: 2.8.2]
          - six [required: >=1.5, installed: 1.16.0]
        - pytz [required: >=2020.1, installed: 2022.1]
      - patsy [required: >=0.5.2, installed: 0.5.2]
        - numpy [required: >=1.4, installed: 1.21.5]
        - six [required: Any, installed: 1.16.0]
      - scipy [required: >=1.3, installed: 1.8.0]
        - numpy [required: >=1.17.3,<1.25.0, installed: 1.21.5]
    - tables [required: Any, installed: 3.7.0]
      - numexpr [required: >=2.6.2, installed: 2.8.1]
        - numpy [required: >=1.13.3, installed: 1.21.5]
        - packaging [required: Any, installed: 21.3]
          - pyparsing [required: >=2.0.2,!=3.0.5, installed: 3.0.7]
      - numpy [required: >=1.19.0, installed: 1.21.5]
      - packaging [required: Any, installed: 21.3]
        - pyparsing [required: >=2.0.2,!=3.0.5, installed: 3.0.7]
    - tqdm [required: Any, installed: 4.63.1]
    - umap-learn [required: >=0.3.10, installed: 0.5.2]
      - numba [required: >=0.49, installed: 0.55.1]
        - llvmlite [required: >=0.38.0rc1,<0.39, installed: 0.38.0]
        - numpy [required: >=1.18,<1.22, installed: 1.21.5]
        - setuptools [required: Any, installed: 63.2.0]
      - numpy [required: >=1.17, installed: 1.21.5]
      - pynndescent [required: >=0.5, installed: 0.5.6]
        - joblib [required: >=0.11, installed: 1.1.0]
        - llvmlite [required: >=0.30, installed: 0.38.0]
        - numba [required: >=0.51.2, installed: 0.55.1]
          - llvmlite [required: >=0.38.0rc1,<0.39, installed: 0.38.0]
          - numpy [required: >=1.18,<1.22, installed: 1.21.5]
          - setuptools [required: Any, installed: 63.2.0]
        - scikit-learn [required: >=0.18, installed: 1.0.2]
          - joblib [required: >=0.11, installed: 1.1.0]
          - numpy [required: >=1.14.6, installed: 1.21.5]
          - scipy [required: >=1.1.0, installed: 1.8.0]
            - numpy [required: >=1.17.3,<1.25.0, installed: 1.21.5]
          - threadpoolctl [required: >=2.0.0, installed: 3.1.0]
        - scipy [required: >=1.0, installed: 1.8.0]
          - numpy [required: >=1.17.3,<1.25.0, installed: 1.21.5]
      - scikit-learn [required: >=0.22, installed: 1.0.2]
        - joblib [required: >=0.11, installed: 1.1.0]
        - numpy [required: >=1.14.6, installed: 1.21.5]
        - scipy [required: >=1.1.0, installed: 1.8.0]
          - numpy [required: >=1.17.3,<1.25.0, installed: 1.21.5]
        - threadpoolctl [required: >=2.0.0, installed: 3.1.0]
      - scipy [required: >=1.0, installed: 1.8.0]
        - numpy [required: >=1.17.3,<1.25.0, installed: 1.21.5]
      - tqdm [required: Any, installed: 4.63.1]
rob-p commented 1 year ago

Looking upstream, perhaps the issue is related to this? I have to say, looking at the responses and issues on the pyranges repo, I'm not super hopeful for a quick fix here...

Also @DongzeHE — we should see if we are affected by this at all.

rob-p commented 1 year ago

@rpolicastro,

OK, I think I figured it out! It's because pyranges doesn't guard against pandas 2.0 (which is obviously a major version bump, very new, and introduces several breaking changes).

So, the real solution is for them to fix the incompatibility upstream. However, the temporary solution is to force a pandas < 2.0. This worked for me:

mamba create -n pyroe -c conda-forge -c bioconda pyroe==0.9.0 pandas==1.5.3

And then we should specify this requirement upstream in bioconda recipe and in the pyproject.toml. Please let me know if this works for you.

--Rob

rpolicastro commented 1 year ago

I can confirm that downgrading to pandas 1.5.3 fixed the error 😀

rob-p commented 1 year ago

Excellent. I've filed the bug report upstream, and am pushing a 0.9.1 of pyroe with the <2.0 restriction on pandas (which also fixes the sub-command aliasing issue). I'll close this for the time being, but hopefully we can get the underlying issue with pyranges fixed upstream (until we get a chance to re-implement the splici/spliceu extraction directly in rust 😉 @DongzeHE).