Closed KatharinaHoff closed 8 months ago
Hi @KatharinaHoff, thanks so much! I forgot this dependency, which I believe is just for loading the pretrained weights from Huggingface. Good catch :) I've made the change and uploaded a new Docker image (reflected in the readme). You can now pull with (I removed the 'public' name):
docker pull hyenadna/hyena-dna:latest
Enjoy!
Dear @exnx , thank you so much for trying to fix it. Sadly, the pip install of git-lfs did not fix the problem. Sorry, my bad, I had not tested my first suggestion. I have now figured out how to fix it. Possibly not the most elegant way... but if you append the following to your Dockerfile, then both the docker and the singularity built contain git lfs and models can be loaded from huggingface:
RUN wget https://github.com/git-lfs/git-lfs/releases/download/v3.4.0/git-lfs-linux-amd64-v3.4.0.tar.gz && \
tar -xvf git-lfs-linux-amd64-v3.4.0.tar.gz && \
cd git-lfs-3.4.0 && \
./install.sh && \
cd .. && \ # I have not tested the last two lines but I think it makes sense to delete the archive; my built still has it
rm git-lfs-linux-amd64-v3.4.0.tar.gz
This solution, I have tested both with Docker and Singularity. git lfs works.
Maybe you also want to add the Singularity instructions (adapt from my initial post here) to the Readme.md? Just an idea to save other people some time. I tested it with singularity-ce version 3.11.3 , all works well.
Thanks for the update! I haven't been able to test it out myself, but I'll report back when I do.
I ended making a second Docker image with the Nucleotide Transformer datasets, and weights to reproduce the results from our paper. This new image includes the correct git-lfs dependency for pulling in weights from Huggingface. You can find the image here:
# pull image
docker pull hyenadna/hyena-dna-nt6:latest
# run container
docker run --gpus all -it -p80:3000 hyenadna/hyena-dna-nt6 /bin/bash
To build the image, I used tips from this thread, which basically just means adding:
RUN curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
RUN sudo apt-get install git-lfs
Eventually I'll add this to the main Dockerfile in the repo, but for now there are 2 Docker images.
Hi @KatharinaHoff and @exnx , thanks a lot for posting directions on how to generate and use a Singularity image of HyenaDNA! I tried as Katharina suggested:
singularity build hyena-dna.sif docker://hyenadna/hyena-dna-public:latest
git clone https://github.com/HazyResearch/hyena-dna.git
cd hyena-dna
SINGULARITYENV_CUDA_VISIBLE_DEVICES=1 singularity exec --nv ~/images/hyena-dna.sif python -m train wandb=null experiment=hg38/genomic_benchmark_scratch
But getting /opt/conda/bin/python: No module named train
Anything missing on my side?
Thanks!
The steps above by Katharina didn't work for me. Instead I used a different set of commands, which you can find in this image instead, on dockerhub. You can find the steps in the readme. I'm not especially familiar with Singularity, but there's a different step when starting the image that tunnels your local directory to the container, otherwise you're just getting the environment, but not any of the code.
hyenadna/hyena-dna-nt6:latest
Great, thank you! As first thing, I tried the nt7 version also available - but the converted sif image was giving errors. Will try converting nt6 and let you know.
On Tue, Nov 14, 2023, 9:03 PM Eric Nguyen @.***> wrote:
The steps above by Katharina didn't work for me. Instead I used a different set of commands, which you can find in this image instead.
hyenadna/hyena-dna-nt6:latest
— Reply to this email directly, view it on GitHub https://github.com/HazyResearch/hyena-dna/issues/14#issuecomment-1811813642, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB7IT6G3WJBOD6DH6K6FYKTYEREJFAVCNFSM6AAAAAA4GHLP6SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMJRHAYTGNRUGI . You are receiving this because you commented.Message ID: @.***>
The nt6
image should be using commands from here. I forgot if nt7
did too or not, might've been testing other things.
But specifically, the commands you want in the Dockerfile are:
RUN curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | bash
RUN apt-get install -y git-lfs
I updated the Dockerfile in this repo, it now includes this command if you build your own image.
Thanks! I re-run with hyena-dna-nt6
, getting:
singularity build hyena-dna_nt6.sif docker://hyenadna/hyena-dna-nt6:latest
cd hyena-dna
singularity exec --nv hyena-dna_nt6.sif python -m train wandb=null experiment=hg38/genomic_benchmark_scratch
13:4: not a valid test operator: (
13:4: not a valid test operator: 510.47.03
/usr/bin/python: No module named train
This strange 'not a valid test operator' is the same I was getting with the sif image of hyena-dna-nt7
actually.
Wish I could use Docker - but I am stuck with Singularity on the HPC A100 nodes I have available.
ChatGPT? ie, check for how to make the docker cmd provided into an equivalent singularity cmd. Sorry, we don't support singularity on our end, we just don't use it.
eg
apptainer pull docker://hyenadna/hyena-dna-nt7:latest
apptainer exec --nv docker://hyenadna/hyena-dna-nt7:latest /bin/bash
No worries, thank you for all the help. I think I found the culprit - my cloned hyenaDNA folder was on a mounted drive (/mnt/ etc.) that for some reason wasn't accessible by the container image. Now moved everything on my home folder and it seems to work. If I have further questions I will reach out on Discord. Thanks again!
Hi!
Thanks for this very cool repository, the preprint is very cool, too!
I am a total beginner to using relevant machine learning libraries, but I think the git-lfs is missing from the docker container of hyena-dna. Admittedly, I did a bit of weird stuff with it because I can't execute docker on the HPC. I converted it to Singularity. Nevertheless, I'd expect to be able to call git-lfs from there, too, and it seems to be missing.
Here's what I did:
Error:
I think it could be fixed by adding git-lfs to the requirements.txt, and then rebuilding the container.