Open salim-b opened 6 years ago
Good catch, thanks! I'd be open to this, good idea, but not sure if that works:
> koRpus::hyphen('foobar and foo self-explanatory ok', hyph.pattern = 'en.us', rm.hyph = TRUE)@hyphen[1, 2]
Hyphenation (language: en.us)
[1] "foo-bar an-d f-oo s-el-fexpl-an-atory ok"
> koRpus::hyphen('foobar and foo self-explanatory ok', hyph.pattern = 'en.us', rm.hyph = FALSE)@hyphen[1, 2]
Hyphenation (language: en.us)
[1] "foo-bar an-d f-oo s-el-fexpl-an-atory ok"
Any ideas?
Any ideas?
Well, I'm not familiar at all with the koRpus package, but maybe I was wrong with the assumption that the parameter rm.hyph
was responsible for the unduly removed hyphens.
Anyway, as far as I understand it, the hyphen()
function expects a character vector of words, not sentences.
Consider this modification of your first example:
suppressPackageStartupMessages({
library(dplyr)
library(magrittr)
library(stringr)
})
"foobar and foo self-explanatory ok" %>% str_split(pattern = " ") %>% unlist() %>%
koRpus::hyphen(hyph.pattern = "en.us", rm.hyph = TRUE, quiet = TRUE) %>%
slot("hyphen") %$% word
#> [1] "foo-bar" "and" "foo"
#> [4] "self-ex-plana-to-ry" "ok"
Now interestingly, if rm.hyph
is set to FALSE
, the hyphenation isn't correct anymore:
suppressPackageStartupMessages({
library(dplyr)
library(magrittr)
library(stringr)
})
"foobar and foo self-explanatory ok" %>% str_split(pattern = " ") %>% unlist() %>%
koRpus::hyphen(hyph.pattern = "en.us", rm.hyph = FALSE, quiet = TRUE) %>%
slot("hyphen") %$% word
#> [1] "foo-bar" "and" "foo"
#> [4] "self-e-xplan-at-ory" "ok"
So it might have it's reason that the default value is TRUE
... 😜
Do I get it right that you're currently feeding whole sentences to the hyphen()
function in helpers.R
? If so, splitting the sentences into words beforehand (and leaving rm.hyph
at it's default value) might solve the issue.
Consider the following reprex:
In the output table the hyphen of the word self-explanatory gets removed (because the parameter
rm.hyph
ofkoRpus::hyphen()
is left at it's default value ofTRUE
).I'm not familiar with the code and therefore didn't submit a pull request (yet). But I guess it would be enough to add the argument
rm.hyph = FALSE
to the following line ofhelpers.R
: https://github.com/Rapporter/pander/blob/32e0f75ef359225a27aba3641fbe4fa84a5dc6d5/R/helpers.R#L404What do you think? Alternatively, if you see any benefit/use case in having the hyphenator removing existing hyphens beforehand (I don't), an additional parameter could be introduced which passes the the option on to
koRpus::hyphen
.