I'm running the following step, tried twice. Both end up with process "killed" (the 2nd attempt already got downloaded files, so no download was skipped). Any suspected reason? RAM 32 GB, not enough memory?
type: opus_read
parameters:
corpus_name: CCMatrix
source_language: de
target_language: en
preprocessing: raw
src_output: sents.de.gz
tgt_output: sents.en.gz
This is possibly the same issue as https://github.com/Helsinki-NLP/OpusTools/issues/32. I tried to run the step, and indeed it's taking a lot of memory. (I killed the process at 15G before it started swapping.)
Cannot replicate this, downloading ParaCrawl v9 works fine for me both with OpusFilter and OpusTools.
Hi,
Thanks!
common:
output_directory: CCMatrix_de-en
steps: