IBMStreams / streamsx.topology

Develop streaming applications for IBM Streams in Python, Java & Scala.
http://ibmstreams.github.io/streamsx.topology
Apache License 2.0
29 stars 43 forks source link

Python packages installed via git are no longer available at runtime #2314

Closed natashadsilva closed 4 years ago

natashadsilva commented 4 years ago

These notebooks:

They used to work as-is in earlier releases but with the latest release of the streamsx package, the dependencies aren't found at runtime and execution fails with an error to this effect: healthdemo not found This causes a problem in the case of the Healthcare notebook because there isn't a way to install it via add_pip_package.

markheger commented 4 years ago

Yes, in earlier releases the way how Python packages are handled and/or added to the sab was different, but had included a lot of problems with dependencies and/or conflicts between packages. Current way to include Python packages on build-service is correct. For this the add_pip_package is the recommended way to add packages.

* [Pybrain model scoring](https://dataplatform.cloud.ibm.com/exchange/public/entry/view/9fc33ce7301f10e21a9f92039ca60bb7)

In this notebook topo.add_pip_package("pybrain3") is used and this notebook should not have any problems.

use 3rd party packages installed via pip install git+....

The Healthcare demo notebook installs the healthdemo Python package using pip install and the link below:

git+https://github.com/IBMStreams/streamsx.health.git#egg=healthdemo&subdirectory=samples/HealthcareJupyterDemo/package

The line above is supported by pip install and could also be part of "requirement.txt" file. Therefore topo.add_pip_package('git+https://github.com/IBMStreams/streamsx.health.git#egg=healthdemo&subdirectory=samples/HealthcareJupyterDemo/package') could work and the line could be added to the "requirement.txt" file used on the build-service to install the required packages.

But current code in topology.py calls pkg_resources.Requirement.parse(requirement), and this causes an excepton like below:

pkg_resources.extern.packaging.requirements.InvalidRequirement: Invalid requirement, parse error at "'+https:/'"

Even resolving this would result in the following error on the build service:

Streaming Analytics service (xxx): The submitted archive code_archive3008920855314927617.zip failed to build with status failed.
Exception in thread "main" java.lang.IllegalStateException: Error submitting archive for compilation:
"Collecting healthdemo from git+https://github.com/IBMStreams/streamsx.health.git#egg=healthdemo&subdirectory=samples/HealthcareJupyterDemo/package (from -r tk6518883778187481431/opt/python/streams/requirements.txt (line 1))"
"  Cloning https://github.com/IBMStreams/streamsx.health.git to /tmp/pip-install-79b4yqof/healthdemo"
"  Error [Errno 2] No such file or directory: 'git': 'git' while executing command git clone -q https://github.com/IBMStreams/streamsx.health.git /tmp/pip-install-79b4yqof/healthdemo"
"Cannot find command 'git' - do you have 'git' installed and in your PATH?"
"make: *** [all] Error 1"

Option 1

Option 2

Upload the healthdemo python package to pypi.org.

markheger commented 4 years ago

add_pip_package should support links (either to whl files or using git). This should be released with streamsx v1.14.12

markheger commented 4 years ago

releases in 1.14.12