PolMine / bignlp

Tools to process large corpora line-by-line and in parallel mode
1 stars 1 forks source link

Output files in chunk_table_split #2

Closed ChristophLeonhardt closed 3 years ago

ChristophLeonhardt commented 5 years ago

In order to specify the output location of the chunk_table_split method, it is necessary to provide the name and path of n target files, which seems unnecessarily complicated. A more practical solution would be to provide an output directory in which n target files are stored instead.

something like (not tested):

output <- "/hd/cl_tmp/"

  if (!is.null(output)){
    output <- file.path(
      output,
      paste(
        gsub("^(.*?)\\..*?$", "\\1", basename(input)),
        "_", 1L:n,
        gsub("^.*(\\..*?)$", "\\1", basename(input)),
        sep = ""
      )
    )
ablaette commented 3 years ago

As we move to Java parallelization, the chunk_table_split() function is not required any more.