ANNIEsoft / ToolAnalysis

Other
10 stars 53 forks source link

Dependencies to be added for next version #99

Open brichards64 opened 5 years ago

brichards64 commented 5 years ago

Root 6 Genie V??? Python modules ??? RatPac library

edrakopo commented 5 years ago

For the energy and track length reconstruction the python scripts have been tested locally with: python3: 3.4.9 and pip3 --version: pip 8.1.2 from /usr/lib/python3.4/site-packages (python 3.4)

Packages: Install using: numpy: 1.16.2 pip3 install numpy pandas: 0.23.3 pip3 install pandas tensorflow: 1.13.1 pip3 install --user --upgrade tensorflow matplotlib: 2.2.2 pip3 install matplotlib sklearn: 0.19.1 pip3 install sklearn

I don't have a strong feeling about specific versions of python3 but tensorflow is not compatible with all newer python3 versions (and potentially some other libraries) so this needs to be checked during installation of a specific version.

pershint commented 5 years ago

I would like to add a tool to ToolAnalysis that loads RATPAC data. To do so, the RATEventLib repository (my repo is at https://github.com/pershint/RATEventLib) should be added to ToolDAQ.

marc1uk commented 5 years ago

Root 6.06/08 In particular ROOT needs to be built with a few additional flags. This is the set that i used for my own build of ROOT -Dcxx14=OFF -Dcxx11=ON -Dgdml=ON -Dxml=ON -Dmt=ON -Dkrb5=ON -Dmathmore=ON -Dx11=ON -Dimt=ON -Dtmva=ON -DCMAKE_BUILD_TYPE=RelWithDebInfo i think things like the gdml, xml, mathmore etc are ones in particular that are needed. Genie v3; I would suggest whatever the latest version available at the time you try to pull this in.

brichards64 commented 5 years ago

I would like to add a tool to ToolAnalysis that loads RATPAC data. To do so, the RATEventLib repository (my repo is at https://github.com/pershint/RATEventLib) should be added to ToolDAQ.

Teal can you move the repo into the ANNIEsoft organisation

marc1uk commented 5 years ago

In order to use the docker image on the gpvm with bind points, it would be good to provide destination directories for /annie and /pnfs in the root directory. Then one can bind-mount those locations into the docker image and access files on the /annie and /pnfs directories with their unmodified paths.

marc1uk commented 3 years ago

It seems like the latest GetToolDAQ.sh script attempts to add in the required python modules, but currently it fails to work. If this was used in the current toolanalysis:latest tag i guess it failed because numpy etc isn't in that docker image.

Running GetToolDAQ.sh --Python3 to install the python dependencies currently fails at the first module, numpy, which no longer supports Python3.6 specified in the image. It seems that Python 3.7 is required above numpy 1.2. Specifying

pip install numpy==1.19.5

allows it to proceed, but the inlined command: pip3.6 install numpy pandas ... appears to download numpy, but not (yet) install it, and the following pandas step then bails out with no such module numpy. I swapped this to

pip3.6 install numpy==1.19.5 && pip3 install pandas...

which progresses further, but still fails to install with RuntimeError: Cannot cythonize without Cython installed. Some googling suggested this may be because dependencies aren't properly pulled in by the tar.gz version of the sourcefiles, but are by the wheel. it's not clear why it's not getting the wheel (edit: maybe because it's a very old pip version or because wheel wasn't installed?), but it can be forced with:

pip3.6 install --only-binary :all: pandas

at which point pandas seems to go through okay. Next, Matplotlib, like numpy, also doesn't support python 3.6 beyond version 3.3. So add a version specification matplotlib==3.3.

The current sklearn version, 0.24, supposedly supports Python 3.6, as does its dependency of scipy>=0.19.1, and yet:

Singularity> pip3.6 install sklearn
Collecting sklearn
  Using cached https://files.pythonhosted.org/packages/1e/7a/dbb3be0ce9bd5c8b7e3d87328e79063f8b263b2b1bfa4774cb1147bfcd3f/sklearn-0.0.tar.gz
Collecting scikit-learn (from sklearn)
  Using cached https://files.pythonhosted.org/packages/f5/ef/bcd79e8d59250d6e8478eb1290dc6e05be42b3be8a86e3954146adbc171a/scikit_learn-0.24.2-cp36-cp36m-manylinux1_x86_64.whl
Collecting threadpoolctl>=2.0.0 (from scikit-learn->sklearn)
  Downloading https://files.pythonhosted.org/packages/f7/12/ec3f2e203afa394a149911729357aa48affc59c20e2c1c8297a60f33f133/threadpoolctl-2.1.0-py3-none-any.whl
Collecting joblib>=0.11 (from scikit-learn->sklearn)
  Downloading https://files.pythonhosted.org/packages/55/85/70c6602b078bd9e6f3da4f467047e906525c355a4dacd4f71b97a35d9897/joblib-1.0.1-py3-none-any.whl (303kB)
    100% |████████████████████████████████| 307kB 169kB/s 
Collecting scipy>=0.19.1 (from scikit-learn->sklearn)
  Using cached https://files.pythonhosted.org/packages/fe/fd/8704c7b7b34cdac850485e638346025ca57c5a859934b9aa1be5399b33b7/scipy-1.6.3.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-build-6lg3a2sp/scipy/setup.py", line 31, in <module>
        raise RuntimeError("Python version >= 3.7 required.")
    RuntimeError: Python version >= 3.7 required.

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-6lg3a2sp/scipy/

However, instead using

pip3.6 install --only-binary :all: scikit-learn

worked. :man_shrugging:

Tensorflow complained about

    WARNING: The wheel package is not available.
    ERROR: 'pip wheel' requires the 'wheel' package. To fix this, run: pip install wheel

and going on the website, also requires an updated pip version

pip3.6 install wheel
pip3.6 install --upgrade pip

which seemed to accept tensorflow. Note that after upgrading pip it seems that pip3.6 moved from /usr/bin/pip3.6 to /usr/local/bin/pip3.6, but $PATH wasn't updated until logging back in.

So i think the final set of commands was:

export PATH=/usr/local/bin:$PATH
pip3.6 install --upgrade pip
pip3.6 install wheel
pip3.6 install numpy==1.19.5
pip3.6 install --only-binary :all: pandas
pip3.6 install matplotlib==3.3.4
pip3.6 install --only-binary :all: scikit-learn
pip3.6 install root_numpy
pip3.6 install tensorflow

Which may be overcomplicated, and has yet to be tested on actual tools. But really we should hard-code all the remaining versions, because the python dependency system is clearly a mess.