Closed scottpchow23 closed 4 years ago
Hi Scott,
Ubuntu 18.04 / Python 3.6 should be a good setup. I'm no Docker expert, but I would not expect any problems.
If you need to make changes to get it working with Docker, I'd be happy to review+approve a PR!
- sean
So I figured out there's actually one more dependency which is a java jdk.
Java 11 seems to work fine; can you confirm if that is acceptable?
Looks like my machine is running Java 1.8. I'm not sure if running with Java 11 will cause issues -- you should check the compatibility with the Anserini and pyjnius. From this paper, there appears to be some effect between 8 and 11, but it appears to be minimal.
- sean
So here's a preview of the dockerfile I have so far:
FROM python:3.6-buster
WORKDIR /workspace
# Copy openNIR files into /workspace
COPY . .
# Install python dependencies
RUN pip install -r requirements.txt
# Install java 11
RUN apt-get update -y
RUN apt-get install openjdk-11-jdk -y
CMD ["/bin/bash"]
While this unfortunately doesn't respect the Java 8 and Ubuntu 18.04 dependencies that you have (it uses Java 11 and Debian 10), I can confirm that I'm able to begin training in the container with the following command:
scripts/pipeline.sh config/conv_knrm config/antique
If you're still interested having this added to the repo, I'd love to open a PR with the dockerfile as well as instructions/caveats on how to run OpenNIR in Docker.
Fun note: I didn't realize how memory hungry loading vectors could be! Loading vectors into memory can easily take up 12-15 GB of RAM and I had to expand the resources for my docker image to get it to not error out on that portion of the training.
Awesome- thanks! Yeah, go ahead and make a PR with the dockerfile and instructions on how to use it. Others will likely find this helpful.
RE: loading vectors: there's probably a better way I could do this that's less resource intensive :)
I plan on dockerizing OpenNIR to attempt to reproduces CEDR_KNRM results on XSEDE's Comet compute cluster as a part of a class project. Are there any potential pitfalls with dockerizing this application? I'm rather new to machine learning and information retrieval in general, but I don't see any obvious problems with this.
I'm planning on dockerizing with the following parameters:
Also, are you open to a PR if I get this working?