ADAH-EviDENce / NewsReader

Docker build of full NewsReader pipeline in Dutch.
Apache License 2.0
2 stars 4 forks source link

Dockerfile : clone this repo only to get nerc model+ jar = necessary? #19

Closed MartineDeVos closed 6 years ago

MartineDeVos commented 6 years ago

For the ixa-pipe-nerc module to run smoothly (without compiled version mismatch) it is necessary to use a specific combination of jar and trained model. For that reason, cloning ixa-pipe-nerc is not a solution. But is it necessary to clone this (!) entire repo to obtain the jar and model?

wmkouw commented 6 years ago

I was thinking of making the Dockerfile download the whole ixa-pipes-1.1.1 package, extracting the necessary exec-jars + models, and removing the rest. That would allow for a single download and avoid version mismatches. Do you agree?

MartineDeVos commented 6 years ago

Sure !

On Fri, Mar 30, 2018, 15:36 Wouter Kouw notifications@github.com wrote:

I was thinking of making the Dockerfile download the whole ixa-pipes-1.1.1 package http://ixa2.si.ehu.es/ixa-pipes/download.html, extracting the necessary exec-jars + models, and removing the rest. That would allow for a single download and avoid version mismatches. Do you agree?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ADAH-EviDENce/NewsReader/issues/19#issuecomment-377525471, or mute the thread https://github.com/notifications/unsubscribe-auth/ABad_-S-GeGX4vHxEWvCKli4zGDD-MtMks5tjjTEgaJpZM4TAANf .

wmkouw commented 6 years ago

i replaced the nerc download commands with the following:

# Download ixa-pipes package to get matching ixa-pipe-nerc jar and model versions
RUN wget http://ixa2.si.ehu.es/ixa-pipes/models/ixa-pipes-1.1.1.tar.gz \
    && mkdir ixa-pipe-nerc \
    && tar -zxvf ixa-pipes-1.1.1.tar.gz ixa-pipes-1.1.1/ixa-pipe-nerc-1.6.1-exec.jar \
    && tar -xzvf ixa-pipes-1.1.1.tar.gz ixa-pipes-1.1.1/nerc-models-1.6.1/nl/nl-6-class-clusters-sonar.bin \
    && mv ixa-pipes-1.1.1/ixa-pipe-nerc-1.6.1-exec.jar ixa-pipe-nerc/ \
    && mv ixa-pipes-1.1.1/nerc-models-1.6.1/nl/nl-6-class-clusters-sonar.bin ixa-pipe-nerc/ \
    && rm ixa-pipes-1.1.1.tar.gz \
    && rm -Rf ixa-pipes-1.1.1

It's pretty slow though... (~500MB)