Open changhiskhan opened 1 year ago
Why not upgrading pyspark from 3.1.2 to 3.1.3 in separate pull request?
We are using Spark 3.2.x. https://github.com/eto-ai/rikai/issues/684
I'd like to deprecate Spark 3.1.x.
we could update to 3.2.x - is Tubi all on 3.2.x now?
@ffcai what version of spark are you guys using?
we could update to 3.2.x - is Tubi all on 3.2.x now?
Yes. Using Databricks, we have to upgrade the databricks runtime version because Databricks are deprecating the old one.
The first error is that no space left on device. I increased the disk quota and it works. And here is the second error:
=> ERROR [whl_builder 6/6] RUN pip3 wheel -r /opt/rikai/python/docker-requirements.txt 1283.8s
------
> [whl_builder 6/6] RUN pip3 wheel -r /opt/rikai/python/docker-requirements.txt:
#13 81.40 Collecting torch>=1.8.1
#13 82.97 Downloading torch-1.12.0-cp39-cp39-manylinux1_x86_64.whl (776.3 MB)
#13 1277.4 ━━━━━━━━━━━━━━ 299.6/776.3 MB 664.6 kB/s eta 0:11:58
#13 1279.3 ERROR: THESE PACKAGES DO NOT MATCH THE HASHES FROM THE REQUIREMENTS FILE. If you have updated the package versions, please update the hashes. Otherwise, examine the package contents carefully; someone may have tampered with them.
#13 1279.3 torch>=1.8.1 from https://files.pythonhosted.org/packages/8f/27/addb0019d7aa3704576ca9c055f7566a3db31f95110e55b31173b87aec4a/torch-1.12.0-cp39-cp39-manylinux1_x86_64.whl#sha256=844f1db41173b53fe40c44b3e04fcca23a6ce00ac328b7099f2800e611766845 (from -r /opt/rikai/python/docker-requirements.txt (line 2)):
#13 1279.3 Expected sha256 844f1db41173b53fe40c44b3e04fcca23a6ce00ac328b7099f2800e611766845
#13 1279.3 Got 45984e61e215ca5985f60c7a64444cab4dcc7dfb9588be4017f7f82cb37b455d
#13 1279.3
#13 1280.8 WARNING: You are using pip version 22.0.3; however, version 22.2 is available.
#13 1280.8 You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
------
executor failed running [/bin/sh -c pip3 wheel -r /opt/rikai/python/docker-requirements.txt]: exit code: 1
ERROR: Service 'quickstart' failed to build : Build failed
With this patch indicated by @changhiskhan :
diff --git a/Dockerfile b/Dockerfile
index e244510..be63155 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -18,6 +18,7 @@ COPY ./python /opt/rikai/python
COPY ./README.md /opt/rikai/README.md
WORKDIR /opt/rikai/python
RUN python3 setup.py bdist_wheel
+RUN pip3 cache purge
RUN pip3 wheel -r /opt/rikai/python/docker-requirements.txt
FROM apache/spark-py:v${SPARK_VERSION} AS jupyter
It works fine for me now.
Docker image was broken but is too complicated to maintain. I simplified the build using 2 builder images (1 for jar 1 for python wheels). I've also added some cleanup to reduce the final image size (~4.5GB now)
In the image itself I only include coco and mojito.
I've also removed scala 2.13 from the GH actions matrix since we're stuck on 2.12 with pyspark for now.
One thing that might be an annoyance is that using the jar builder I'm putting the jar directly into the spark classpath so I've removed the part in the notebooks that's downloading the rikai jar as a separate dependency. This means if you're running the notebooks on their own you'll need to add it back. Happy to chat if this is a problem.