NVIDIA / spark-rapids-jni

RAPIDS Accelerator JNI For Apache Spark
Apache License 2.0
36 stars 64 forks source link

Dockerfile should derive from cudf Java Dockerfile #203

Open jlowe opened 2 years ago

jlowe commented 2 years ago

Since this project builds libcudf and libcudfjni, ideally the Dockerfile used for this project should derive from the Dockerfile used for the nightly cudf Java jar builds. Doing so would require publishing the cudf Java Docker image so it can be referenced in this repository's Dockerfile.

GaryShen2008 commented 2 years ago

Where should we publish the cudf java docker? dockerhub? quay.io?

jlowe commented 2 years ago

It may make the most sense to publish it under dockerhub/gpuci since the Java Dockerfile is part of the cudf project and other RAPIDS images are published there. If that's not an option, I don't have a strong opinion on where to publish it as long as it is publicly accessible for this project. cc: @sameerz for visibility.

sameerz commented 2 years ago

dockerhub/gpuci seems like the right place to publish the cudf Java jar builds.

cc: @raydouglass for visibility

pxLi commented 2 years ago

follow-up to figure out a good way to support auto-trigger in cudf repo w/o cost too much

pxLi commented 2 years ago

manually triggered branch-22.06 build at https://github.com/rapidsai/cudf/runs/6379507642?check_suite_focus=true

new image is available at https://hub.docker.com/r/rapidsai/cudf-jni-build/tags

jlowe commented 2 years ago

follow-up to figure out a good way to support auto-trigger in cudf repo w/o cost too much

I'm confused why this repository would track an issue with the cudf repository. The Dockerfile in question is in cudf, and ideally changes to that Dockerfile should trigger a rebuild and push of the new image to the Docker repository, possibly with a workflow action in the cudf repository. It feels odd to "fix" that problem in this repository, especially since it's not in the rapidsai domain.

Also fixing this particular issue, where we base the spark-rapids-jni Dockerfile on the published cudf Dockerfile, is somewhat orthogonal. We can get one working without the other, and thus they should be handled as separate issues.

pxLi commented 2 years ago

follow-up to figure out a good way to support auto-trigger in cudf repo w/o cost too much

I'm confused why this repository would track an issue with the cudf repository. The Dockerfile in question is in cudf, and ideally changes to that Dockerfile should trigger a rebuild and push of the new image to the Docker repository, possibly with a workflow action in the cudf repository. It feels odd to "fix" that problem in this repository, especially since it's not in the rapidsai domain.

Also fixing this particular issue, where we base the spark-rapids-jni Dockerfile on the published cudf Dockerfile, is somewhat orthogonal. We can get one working without the other, and thus they should be handled as separate issues.

Since the requirement of cudfjni build image of rapidsai is originally coming from this one, I would put the context here as a reminder. Potentially we could use a workaround like scan cudf commits in this github action of this repo, if found related dockerfile changes from upstream submodule, then notify us to do manual trigger.

This is more about we have less access of rapidsai repo. It would be not that easy for us to make changes or debugging stuff by leveraging their CICD resources in the future. And yes, I totally agree this should be a separate issue in cudf repo.