datalad / datalad-neuroimaging

DataLad extension for neuroimaging research
http://datalad.org
Other
17 stars 14 forks source link

Docs failing "no module named `numpy`" #98

Closed jsheunis closed 2 years ago

jsheunis commented 2 years ago

See https://readthedocs.org/projects/datalad-neuroimaging/builds/

Excerpt:

Installed /home/docs/checkouts/readthedocs.org/user_builds/datalad-neuroimaging/envs/latest/lib/python3.7/site-packages/datalad_neuroimaging-0.3.1-py3.7.egg
Processing dependencies for datalad-neuroimaging==0.3.1
Searching for pandas
Reading https://pypi.org/simple/pandas/
Downloading https://files.pythonhosted.org/packages/4d/aa/e7078569d20f45e8cf6512a24bf2945698f13a7975650773c01366ea96dc/pandas-1.4.0.tar.gz#sha256=cdd76254c7f0a1583bd4e4781fb450d0ebf392e10d3f12e92c95575942e37df5
Best match: pandas 1.4.0
Processing pandas-1.4.0.tar.gz
Writing /tmp/easy_install-s5qiofc6/pandas-1.4.0/setup.cfg
Running pandas-1.4.0/setup.py -q bdist_egg --dist-dir /tmp/easy_install-s5qiofc6/pandas-1.4.0/egg-dist-tmp-mwnrd8fq
Traceback (most recent call last):
  File "/home/docs/checkouts/readthedocs.org/user_builds/datalad-neuroimaging/envs/latest/lib/python3.7/site-packages/setuptools/sandbox.py", line 156, in save_modules
    yield saved
  File "/home/docs/checkouts/readthedocs.org/user_builds/datalad-neuroimaging/envs/latest/lib/python3.7/site-packages/setuptools/sandbox.py", line 198, in setup_context
    yield
  File "/home/docs/checkouts/readthedocs.org/user_builds/datalad-neuroimaging/envs/latest/lib/python3.7/site-packages/setuptools/sandbox.py", line 259, in run_setup
    _execfile(setup_script, ns)
  File "/home/docs/checkouts/readthedocs.org/user_builds/datalad-neuroimaging/envs/latest/lib/python3.7/site-packages/setuptools/sandbox.py", line 46, in _execfile
    exec(code, globals, locals)
  File "/tmp/easy_install-s5qiofc6/pandas-1.4.0/setup.py", line 18, in <module>
    """Find files under subdir having specified extensions
ModuleNotFoundError: No module named 'numpy'

Not sure why this is happening? Should numpy be an explicit dependency?

mslw commented 2 years ago

Quick observation: this might be version mismatch between python, pandas, and numpy.

I understand that numpy is coming in as a dependency of pandas, which itself is specified (no version pinned) in our setup.py (with comment "bids2scidata export").

From what I can see, the docs build is done on python 3.7. Pandas pypi page says that pandas 1.4 (latest release) requires Python >= 3.8, yet the logs show it tries to install 1.4.0 and fails (for reference, in a python3.7 virtualenv, a simple pip install pandas gives me pandas-1.3.5 and numpy-1.21.5).

I think one of these should help:

jsheunis commented 2 years ago

Thanks @mslw!

I think updating the doc build version makes sense, since the updated datalad-extension-template is also on 3.8: https://github.com/datalad/datalad-extension-template/blob/master/.github/workflows/docbuild.yml. So this will also help the effort to bring this repo up to speed with the latest template version #99.

mslw commented 2 years ago

I agree. However, I'm a bit out of the loop with what the docs build process should look like. Specifically, I don't yet see how the github workflow docbuild.yml (from template) relates to the readthedocs build. Can you help me out?

Apparently, the template builds the docs in a GitHub action, but doesn't put them anywhere (or at least it doesn't have a readthedocs template - I might be wrong). Datalad-neuroimaging has no github actions, and builds the docs (only) on readthedocs. For comparison, datalad-metalad seems to do both, but I'm not sure if they are related?

From my recent exploration, readthedocs uses python-3.7 by default and can be affected by adding an (optional) .readthedocs.yaml configuration file with build.tools.python set to 3.8 (or later). That seems to be a quick way out, but I'm not sure how it relates to the template, if at all.

jsheunis commented 2 years ago

The integration of datalad-neuroimaging with RTD is done via webhooks, setup in the repo setings. https://docs.readthedocs.io/en/stable/integrations.html#github A push to master sends a webhook to RTD, which determines diffs and then build the docs on their infrastructure, using the makefile in the docs folder.

The updated extension template uses an action to build the docs.

Specifically, I don't yet see how the github workflow docbuild.yml (from template) relates to the readthedocs build. Can you help me out?

I am also not sure about this, since I don't have the privileges to inspect the settings of metalad / catalog in order to see if there are also webhooks involved or some other process. I asked @mih to set this up for me once I wanted the datalad catalog docs online.

I had the same confusion when I first had to deal with extension docs. I think it would be useful to add a short explainer about how this process works (for datalad-internal extensions and external extensions) in the extension template docs directory, perhaps in a README.

mslw commented 2 years ago

Could it be that the github workflow is just to test whether the docs build (and show how they could be built if not using reathedocs)? That is, not related to readthedocs at all?

jsheunis commented 2 years ago

It could be yes, although I'm not sure. @datalad/developers could someone clarify this please?

mih commented 2 years ago

Yes, your conclusions are all correct from my POV. We have the github workflow for doc build to see that they build without all the RTD specific defaults in the environment. And we have the RTD preview build to see how the docs look like. Aligning all this to the template setup is the best action.

I would not want to pin a numpy version for everything, just to have the docs build.

mslw commented 2 years ago

Thanks for clarifying!

I submitted a PR which only adds readthedocs.yaml to address the build issue right away.

Another line of action would be to check whether the package with all its dependencies (pandas et al) really needs to be installed for the docs to build (docs can possibly have their own requirements, which can also be specified in yaml) - but I think sticking to python 3.8 for docs should reasonably make the problem go away.