Closed peterdekker closed 6 years ago
Thanks for looking into this, is there already news on a solution? Or are there possible solutions I could try out myself?
Sorry it took a while, looking into this..I managed to replicate the issue just now
Okay, found it... Nextflow excludes files named exactly like the input files in the output, so that's where things went wrong. The above commit should fix it, will do a release right away.
Great, thanks much!
With the new fix, the Nextflow script exits without errors, and says it has created files:
Frog output document written to frog_output/356417.ticcl.folia.xml
etc.
However, in reality the frog_output
directory is empty. I do see an output
directory with folia files in work/
, so I guess that the copying of the directory to the right location goes wrong.
EDIT: The error is in the publishDir
lines. It works when I change these to:
publishDir params.outputdir, pattern: "output/*.xml", mode: 'copy', overwrite: true
Except that a redundant subdirectory output/
is created inside frog_output
Also, I was wondering, when invoking frog
in the text2folia
function, shouldn't there be a directory argument after --testdir
? https://github.com/LanguageMachines/PICCL/blob/master/frog.nf#L93
@proycon Could this issue be re-opened, based on the new information in my last comment?
Ah right, I forgot to adapt publishDir
after that last fix... Now I wonder if nextflow has an option to prevent that redundant output/
dir, or if I need to solve that in yet another task...
Also, I was wondering, when invoking frog in the text2folia function, shouldn't there be a directory argument after --testdir? https://github.com/LanguageMachines/PICCL/blob/master/frog.nf#L93
Yes? There is; the directory is input/
Oops my bad, the end of the line fell off in the Github view :/
Regarding the directory issue, would the following be possible? Inside the script, create a directory with the name params.outputdir and use that for frog output. Then, when invoking publishDir, match this directory name and move it to the current directory (instead of to params.outputdir).
I implemented a different solution, the output files now have "frogged" in their filename (*.frogged.folia.xml
) so they don't clash with the input.
I am running Frog as part of the LaMachine distribution. When I run the following command: $ nextflow run LanguageMachines/PICCL/frog.nf --inputdir ticcl_output/ --inputformat folia --extension folia.xml --skip=acmpn --outputdir frog_output (same result without --inputformat and --outputdir, or with --extension xml)
I get the following error:
It seems that the Nextflow script cannot find the xml output from frog. This seems to go wrong in lines 72 and 117 of ocr.nf (https://github.com/LanguageMachines/PICCL/blob/master/frog.nf#L72), where the output is defined using a Wildcard. When I run an earlier version of frog.nf, where the output is more explicitly defined, it runs without errors: https://github.com/LanguageMachines/PICCL/commit/b4e05a044d6ae4037c7e435fe26dbb5f6c700f72#diff-b1623eb35be7cba58a6c27b0a3e54453R57