PNNL-CompBio / coderdata

Automation scripts and benchmark dataset package for cancer drug prediction deep learning models.
Other
11 stars 3 forks source link

MPNST samples generation is failing due to a dependency issue #204

Closed jjacobson95 closed 1 month ago

jjacobson95 commented 2 months ago

Reproduced twice, same error each time.

Error:

b'Error in library(synapser) : there is no package called \xe2\x80\x98synapser\xe2\x80\x99\nExecution halted\n'

Context: The synapser R package is not being properly downloaded into the docker image, causing the build pipeline to fail. The reason is unknown, however it looks like you may have dealt with a dependency issue related to this recently.

Tracing the code we can see that the mpnst Dockerfile has an updated R base and instructions to install the requirements.r file. The requirements.r file includes the synapser package although the 'repos' argument differs from the documentation which was most recently updated on 8/21/24.

Dockerfile build logs confirm this is not installing correctly.

#106 215.4 ERROR: dependency ‘rjson’ is not available for package ‘synapser’
#106 215.4 * removing ‘/usr/local/lib/R/site-library/synapser’
#106 215.4 
#106 215.4 The downloaded source packages are in
#106 215.4      ‘/tmp/Rtmp0MtMyd/downloaded_packages’
#106 215.4 Warning message:
#106 215.4 In install.packages("synapser", repos = c("http://ran.synapse.org",  :
#106 215.4   installation of package ‘synapser’ had non-zero exit status
#106 215.4 Installing package into ‘/usr/local/lib/R/site-library’
#106 215.4 (as ‘lib’ is unspecified)
#106 215.8 also installing the dependencies ‘R.oo’, ‘R.methodsS3’

To fix, I will try updating the 'repos' argument in install.packages, and adding rjson to the requirements.r file.

Note: If we create stable docker images, we should make sure to add a test for this.

To Reproduce:

aws_1.sh: Setup script 1


# Install Git
sudo yum install git -y

# Clone the repository
git clone https://github.com/PNNL-CompBio/coderdata.git

# Install Docker and configure it
sudo amazon-linux-extras install docker -y
sudo service docker start
sudo systemctl enable docker
sudo usermod -a -G docker ec2-user

sudo curl -L https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m) -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose

echo "Part 1 completed successfully!"

aws_2.sh: Setup script 2

#!/bin/bash

# Display Docker info
docker info

# Install development tools
sudo yum groupinstall "Development Tools" -y

# Erase previous OpenSSL development files
sudo yum erase openssl-devel -y

# Install required libraries
sudo yum install openssl11 openssl11-devel libffi-devel bzip2-devel wget -y

# Download and install Python 3.10
wget https://www.python.org/ftp/python/3.10.4/Python-3.10.4.tgz
tar -xf Python-3.10.4.tgz
cd Python-3.10.4
./configure --enable-optimizations
make -j $(nproc)
sudo make altinstall

# Install pip for Python 3.10
sudo yum install python3-pip -y

# Change directory to coderdata and install requirements
cd ~/coderdata
pip3.10 install -r requirements.txt

# Set SYNAPSE_AUTH_TOKEN if provided
if [ -n "$1" ]; then
  export SYNAPSE_AUTH_TOKEN="$1"
  echo "SYNAPSE_AUTH_TOKEN set to $1"
else
  echo "No token provided; proceed without authentication token."
fi

echo "Part 2 completed successfully!"
jjacobson95 commented 2 months ago

Looks like "rjson" requires R version 4.4.0 which is higher than the mpnst DockerFile r-base:4.3.2. When I upgrade this version, rjson installs correctly. However, installing synapser gives the following error message. As a note, it says numpy is not found and installs a version using reticulate but numpy should already be installed in the environment.

synapser installation error:

* installing *source* package ‘synapser’ ...
** using staged installation
[1] "*** Using Python Configuration:"
python:         /root/.virtualenvs/r-reticulate/bin/python
libpython:      /usr/lib/python3.12/config-3.12-x86_64-linux-gnu/libpython3.12.so
pythonhome:     /root/.virtualenvs/r-reticulate:/root/.virtualenvs/r-reticulate
version:        3.12.5 (main, Aug 22 2024, 13:11:09) [GCC 14.2.0]
numpy:           [NOT FOUND]
Using virtual environment '/root/.virtualenvs/r-reticulate' ...
+ /root/.virtualenvs/r-reticulate/bin/python -m pip install --upgrade --no-user 'pandas>=1.5,<=2.0.3' jinja2 markupsafe 'numpy<=1.24.4'
Collecting pandas<=2.0.3,>=1.5
  Using cached pandas-2.0.3.tar.gz (5.3 MB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting jinja2
  Using cached jinja2-3.1.4-py3-none-any.whl.metadata (2.6 kB)
Collecting markupsafe
  Using cached MarkupSafe-2.1.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.0 kB)
Collecting numpy<=1.24.4
  Using cached numpy-1.24.4.tar.gz (10.9 MB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error

  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [33 lines of output]
      Traceback (most recent call last):
        File "/root/.virtualenvs/r-reticulate/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/root/.virtualenvs/r-reticulate/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/root/.virtualenvs/r-reticulate/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 112, in get_requires_for_build_wheel
          backend = _build_backend()
                    ^^^^^^^^^^^^^^^^
        File "/root/.virtualenvs/r-reticulate/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 77, in _build_backend
          obj = import_module(mod_path)
                ^^^^^^^^^^^^^^^^^^^^^^^
        File "/usr/lib/python3.12/importlib/__init__.py", line 90, in import_module
          return _bootstrap._gcd_import(name[level:], package, level)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
        File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
        File "<frozen importlib._bootstrap>", line 1310, in _find_and_load_unlocked
        File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
        File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
        File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
        File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
        File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
        File "<frozen importlib._bootstrap_external>", line 995, in exec_module
        File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
        File "/tmp/pip-build-env-orwb5ow2/overlay/lib/python3.12/site-packages/setuptools/__init__.py", line 16, in <module>
          import setuptools.version
        File "/tmp/pip-build-env-orwb5ow2/overlay/lib/python3.12/site-packages/setuptools/version.py", line 1, in <module>
          import pkg_resources
        File "/tmp/pip-build-env-orwb5ow2/overlay/lib/python3.12/site-packages/pkg_resources/__init__.py", line 2172, in <module>
          register_finder(pkgutil.ImpImporter, find_on_path)
                          ^^^^^^^^^^^^^^^^^^^
      AttributeError: module 'pkgutil' has no attribute 'ImpImporter'. Did you mean: 'zipimporter'?
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
Error: Error installing package(s): "'pandas>=1.5,<=2.0.3'", "jinja2", "markupsafe", "'numpy<=1.24.4'"
Execution halted
ERROR: configuration failed for package ‘synapser’
* removing ‘/usr/local/lib/R/site-library/synapser’
sgosline commented 2 months ago

Yes, synapser failed to catch up to the numpy bug. This was finally addressed about 4 days ago, see comment on synapser github site.

jjacobson95 commented 2 months ago

I don't think this was fully addressed as I ran into this current issue after the latest synapser release. I used R version 4.4.0 and Python 3.10.

sgosline commented 2 months ago

I think it is, since the update to the synapser docs came this week, and I haven't touched this code for over a month. Something shifted and requires updating to account for the new release.

thomasyu888 commented 2 months ago

Hi all, just following a link here, but we also noticed the rjson issue, as outlined here: https://github.com/Sage-Bionetworks/synapser/blob/develop/DESCRIPTION.

It's a bit unfortunate but since we ship for all 4 R versions, we pin the dependency of rjson to rjson@0.2.21.

jjacobson95 commented 2 months ago

Thanks @thomasyu888. I've pinned rjson to rjson@0.2.21, used R 4.3.2, and python3.10 and synapse now appears to be downloading correctly. If you are interested in a stable docker container, I have one working using the following files in this commit - build/docker/Dockerfile.mpnst , build/mpnst/requirements.r, build/mpnst/requirements.txt, 61965a51fd6d40f1962ed87efa84cef804b08f63, though it has some extra files that should be removed.

However, the MPNST dataset is still not generating so I'll continue to track this here. Likely due to other dependancies or just things that have shifted in the code since the last version.

thomasyu888 commented 2 months ago

Thanks @jjacobson95 for tagging me! You may also want to use reticulate@1.28 because future versions of reticulate has unintended consequences as outlined here: https://r-docs.synapse.org/articles/troubleshooting.html#using-synapser-with-reticulate