Closed ablaette closed 3 years ago
This is not a CoreNLP issue, but results from a preprocessing corenlp_annotate()
applies implicitly by default that the $process_files()
approach doesn't (= calling purge()
). In the most bignlp version on the dev branch, I introduced that argument purge
so that we can control whether purge()
is called. If we set the argument to FALSE
, the result of the annotation df1 and df2 will have the same length.
df2 <- corenlp_annotate(
sample_dt,
properties = properties(props_file),
progress = FALSE,
purge = TRUE
)
A unit test checks that results are identical now.
This is an example of Christoph Leonhardt that the two different methods described in the (new) package vignette may yield different results.
First the approach using temporary files.
Now using the in-memory approach ...
But now df2 is shorter - see where the tokenstreams are different.