LanguageMachines / PICCL

A set of workflows for corpus building through OCR, post-correction and normalisation
Other
48 stars 6 forks source link

Different versions of linguistic workflow tools #31

Closed peterdekker closed 6 years ago

peterdekker commented 6 years ago

On my installed LaMachine system, there are multiple versions of the installed tools available:

/vol1/lamachine/opt/nextflow/assets/LanguageMachines/PICCL/frog.nf
/vol1/lamachine/src/PICCL/frog.nf

(and /vol1/lamachine/bin/frog.nf which points to /vol1/lamachine/src/PICCL/frog.nf)

On the commandline, I use the opt path, whereas the CLAM webinterface uses the src path. On my current system, the tools are the same, but I remember from a previous installation that they differed.

Would it be possible to merge the tools to one location, to prevent confusion due to different versions in the future?

proycon commented 6 years ago

opt/PICCL should just be a symlink to src/PICCL.

The nextflow thing is a bit more complex; it is possible to let Nextflow (rather than LaMachine) handle the pulling of the PICCL git repository, which is what happens when you do for instance nextflow run LanguageMachines/PICCL/frog.nf. You can even have nextflow take care of pulling all of LaMachine from Docker.

But within LaMachine, just invoke the workflow directly (e.g frog.nf, it should be in your $PATH) rather than through nextflow run. The CLAM webservice also uses the direct invocation. I can see this may be the cause of some confusion indeed, the PICCL README has been made clearer to reflect this a while ago but I may not have advertised it enough.

Hopefully this clears the confusion.

peterdekker commented 6 years ago

Thanks, that is good to know! We always invoked the tools via nextflow, I will test the direct invocation of the tools and add that to our documentation.