bluesky / databroker

Unified API pulling data from multiple sources
https://blueskyproject.io/databroker
BSD 3-Clause "New" or "Revised" License
33 stars 45 forks source link

Deprecated distutils import from upstream dependency in v1.2.5 #789

Open ravngr opened 8 months ago

ravngr commented 8 months ago

Expected Behavior

The bluesky tutorial should be completeable on an up-to-date install of Python 3.12.

Current Behavior

Loading the databroker module raises a ModuleNotFoundError due to an import of the distutils in an upstream dependency intake.

Full Traceback ``` --------------------------------------------------------------------------- ModuleNotFoundError Traceback (most recent call last) Cell In[3], line 2 1 from databroker import Broker ----> 2 db = Broker.named('temp') 4 # Insert all metadata/data captured into db. 5 RE.subscribe(db.insert) File c:\\Users\\User\\repo\\.venv\\Lib\\site-packages\\databroker\\v1.py:256, in Broker.named(cls, name, auto_register) 225 \"\"\" 226 Create a new Broker instance using a configuration file with this name. 227 (...) 253 db : Broker 254 \"\"\" 255 if name == 'temp': --> 256 return temp() 257 else: 258 try: File c:\\Users\\User\\repo\\.venv\\Lib\\site-packages\\databroker\\v1.py:39, in temp() 38 def temp(): ---> 39 from .v2 import temp 40 catalog = temp() 41 return Broker(catalog) File c:\\Users\\User\\repo\\.venv\\Lib\\site-packages\\databroker\\v2.py:5 2 import importlib 3 import tempfile ----> 5 from .core import parse_handler_registry, discover_handlers, parse_transforms 6 from intake.catalog import Catalog 7 from event_model import DuplicateHandler File c:\\Users\\User\\repo\\.venv\\Lib\\site-packages\\databroker\\core.py:17 15 import intake.catalog.base 16 import intake.catalog.local ---> 17 import intake.container.base 18 from intake.compat import unpack_kwargs 19 import msgpack File c:\\Users\\User\\repo\\.venv\\Lib\\site-packages\\intake\\container\\__init__.py:8 1 #----------------------------------------------------------------------------- 2 # Copyright (c) 2012 - 2018, Anaconda, Inc. and Intake contributors 3 # All rights reserved. 4 # 5 # The full license is in the LICENSE file, distributed with this software. 6 #----------------------------------------------------------------------------- ----> 8 from .dataframe import RemoteDataFrame 9 from .ndarray import RemoteArray 10 from .semistructured import RemoteSequenceSource File c:\\Users\\User\\repo\\.venv\\Lib\\site-packages\\intake\\container\\dataframe.py:7 1 #----------------------------------------------------------------------------- 2 # Copyright (c) 2012 - 2018, Anaconda, Inc. and Intake contributors 3 # All rights reserved. 4 # 5 # The full license is in the LICENSE file, distributed with this software. 6 #----------------------------------------------------------------------------- ----> 7 from distutils.version import LooseVersion 10 from intake.source.base import Schema, DataSource 11 from .base import RemoteSource, get_partition ModuleNotFoundError: No module named 'distutils' ```

Possible Solution

Steps to Reproduce (for bugs)

  1. Install Python 3.12.
  2. Follow the bluesky tutorial to the Prepare Data Storage step.
    from databroker import Broker
    db = Broker.named('temp')

Context

databroker v1.2.5 requires intake >= 0.5.5, <= 0.6.4 though this appears to have been removed in the v2 betas (sorry not familiar with the current build system so wasn't able to check). Dependency on distutils is removed in intake >= 0.6.7 (before, after). I haven't tested a build of the v1 branch with an updated dependency.

Your Environment

Python 3.12 virtual environment on Windows 11 with bluesky and other packages installed via pip.

pip freeze ``` anyio==4.2.0 appdirs==1.4.4 argon2-cffi==23.1.0 argon2-cffi-bindings==21.2.0 arrow==1.3.0 asciitree==0.3.3 asttokens==2.4.1 async-lru==2.0.4 attrs==23.2.0 Babel==2.14.0 beautifulsoup4==4.12.2 bleach==6.1.0 bluesky==1.12.0 bluesky-live==0.0.8 boltons==23.1.1 cachetools==5.3.2 certifi==2023.11.17 cffi==1.16.0 charset-normalizer==3.3.2 click==8.1.7 cloudpickle==3.0.0 colorama==0.4.6 comm==0.2.1 contourpy==1.2.0 cycler==0.12.1 dask==2023.12.1 databroker==1.2.5 debugpy==1.8.0 decorator==5.1.1 defusedxml==0.7.1 dnspython==2.4.2 doct==1.1.0 entrypoints==0.4 event-model==1.19.9 executing==2.0.1 fasteners==0.19 fastjsonschema==2.19.1 fonttools==4.47.0 fqdn==1.5.1 fsspec==2023.12.2 HeapDict==1.0.1 historydict==1.2.6 humanize==4.9.0 idna==3.6 imageio==2.33.1 importlib-metadata==7.0.1 importlib-resources==6.1.1 intake==0.6.4 ipykernel==6.28.0 ipython==8.20.0 isoduration==20.11.0 jedi==0.19.1 Jinja2==3.1.2 json5==0.9.14 jsonpointer==2.4 jsonschema==4.20.0 jsonschema-specifications==2023.12.1 jupyter-events==0.9.0 jupyter-lsp==2.2.1 jupyter_client==8.6.0 jupyter_core==5.7.1 jupyter_server==2.12.3 jupyter_server_terminals==0.5.1 jupyterlab==4.0.10 jupyterlab_pygments==0.3.0 jupyterlab_server==2.25.2 kiwisolver==1.4.5 locket==1.0.0 MarkupSafe==2.1.3 matplotlib==3.8.2 matplotlib-inline==0.1.6 mistune==3.0.2 mongoquery==1.4.2 msgpack==1.0.7 msgpack-numpy==0.4.8 nbclient==0.9.0 nbconvert==7.14.0 nbformat==5.9.2 nest-asyncio==1.5.8 networkx==3.2.1 notebook_shim==0.2.3 numcodecs==0.12.1 numpy==1.26.3 ophyd==1.9.0 overrides==7.4.0 packaging==23.2 pandas==2.1.4 pandocfilters==1.5.0 parso==0.8.3 partd==1.4.1 pillow==10.2.0 PIMS==0.6.1 Pint==0.23 pipdeptree==2.13.1 platformdirs==4.1.0 prettytable==3.9.0 prometheus-client==0.19.0 prompt-toolkit==3.0.43 psutil==5.9.7 pure-eval==0.2.2 pycparser==2.21 Pygments==2.17.2 pymongo==4.6.1 pyparsing==3.1.1 python-dateutil==2.8.2 python-json-logger==2.0.7 pytz==2023.3.post1 pywin32==306 pywinpty==2.0.12 PyYAML==6.0.1 pyzmq==25.1.2 referencing==0.32.1 requests==2.31.0 rfc3339-validator==0.1.4 rfc3986-validator==0.1.1 rpds-py==0.16.2 Send2Trash==1.8.2 six==1.16.0 slicerator==1.1.0 sniffio==1.3.0 soupsieve==2.5 stack-data==0.6.3 suitcase-mongo==0.4.0 suitcase-msgpack==0.3.0 suitcase-utils==0.5.4 super-state-machine==2.0.2 terminado==0.18.0 tifffile==2023.12.9 tinycss2==1.2.1 toolz==0.12.0 tornado==6.4 tqdm==4.66.1 traitlets==5.14.1 types-python-dateutil==2.8.19.20240106 typing_extensions==4.9.0 tzdata==2023.4 tzlocal==5.2 uri-template==1.3.0 urllib3==2.1.0 wcwidth==0.2.13 webcolors==1.13 webencodings==0.5.1 websocket-client==1.7.0 xarray==2023.12.0 zarr==2.16.1 zict==2.2.0 zipp==3.17.0 ```
tacaswell commented 8 months ago

We have tightly pinned intake because we were running into frequent small changes to the API. In databroker v2 we completely refactored the dependency on intake out.

Going back to 3.11 is likely the fastest option.

we do not appear to have a 1.2.x bug-fix branch at the moment, on my fork I pushed a branch adjusts the dependency, but it does not look like tests are passing : https://github.com/tacaswell/databroker/actions/runs/7475146754/job/20342717390