dwadden / dygiepp

Span-based system for named entity, relation, and event extraction.
MIT License
575 stars 120 forks source link

Dockerfile expects script that was removed #63

Closed josephhaaga closed 3 years ago

josephhaaga commented 3 years ago

It looks like scripts/pretrained/get_scibert.py was removed in this commit, but the Dockerfile still expects it

$ docker build .

420100K .......... .......... .......... .......... .......... 99% 2.89M 0s
420150K .......... .......... .......... .......... .......... 99% 4.39M 0s
420200K .......... ...                                        100%  108M=4m20s

2021-05-13 18:44:49 (1.58 MB/s) - ‘./pretrained/mechanic-granular.tar.gz’ saved [430298612/430298612]

Removing intermediate container 2024d2013fad
 ---> b44f296f2ce3
Step 19/27 : COPY scripts/pretrained/get_scibert.py /tmp/get_scibert.py
COPY failed: stat /var/lib/docker/tmp/docker-builder787936846/scripts/pretrained/get_scibert.py: no such file or directory

I understand that the Dockerfile isn't officially supported; just wanna log this here in case someone encounters the same issue

josephhaaga commented 3 years ago

FWIW I removed some of the Dockerfile blocks related to other models/datasets; here's the Dockerfile I used

# Set-up docker image for DYGIE++.
FROM pytorch/pytorch:1.6.0-cuda10.1-cudnn7-devel

# Datasets will be downloaded to the /dygiepp root directory in image.
# Please mount source code project dir at /dygiepp for using default paths.
RUN mkdir /dygiepp

# Required-base: set-up shared DYGIE++ modeling environment.
# GCC and make needed to compile python deps. SQLite3 for Optuna hyperparameter optimization.
RUN apt-get update && \
    apt-get -y install gcc make sqlite3
RUN conda create --name dygiepp python=3.7 -y
SHELL ["conda", "run", "-n", "dygiepp", "/bin/bash", "-c"]
# jsonnet has a conflict when installed with pip for now, install from conda.
RUN conda install -c conda-forge jsonnet -y
COPY requirements.txt /tmp/requirements.txt
RUN pip install -r /tmp/requirements.txt

# SciERC, GENIA, ChemProt: Download data.
# Downloader scripts require wget, unzip, and shared parsing code.
RUN apt-get install unzip wget -y
COPY scripts/data/shared /dygiepp/scripts/data/shared
# SciERC
COPY scripts/data/get_scierc.sh /tmp/get_scierc.sh
COPY dygie /dygiepp/dygie
ENV PYTHONPATH="${PYTHONPATH}:/dygiepp"
SHELL ["conda", "run", "-n", "dygiepp", "/bin/bash", "-c"]
RUN cd /dygiepp && bash /tmp/get_scierc.sh

# Pretrained-models-all-DYGIEPP: Download pre-trained models for all DYGIEPP tasks.
RUN apt-get install wget -y
COPY scripts/pretrained/get_dygiepp_pretrained.sh /tmp/get_dygiepp_pretrained.sh
RUN cd /dygiepp && bash /tmp/get_dygiepp_pretrained.sh

# SciBERT: Download fine-tuned pretrained SciBERT model.
COPY scripts/pretrained/get_scibert.py /tmp/get_scibert.py
RUN cd /dygiepp && python /tmp/get_scibert.py

# Required-base: cleanup.
RUN rm -rf /tmp /dygiepp/{scripts,dygie}

# Required-base: on run, ensure conda env is activated and /dygiepp is workdir.
WORKDIR /dygiepp/
SHELL ["/bin/bash", "-c"]
RUN conda init bash
RUN echo "conda activate dygiepp" >> ~/.bashrc
ENV PATH /opt/conda/envs/dygiepp/bin:$PATH
ENV CONDA_DEFAULT_ENV dygiepp
dwadden commented 3 years ago

Nice, thanks for catching this. Do you mind submitting a PR with the updated Dockerfile?