Rapporter / pander

An R Pandoc Writer: Convert arbitrary R objects into markdown
http://rapporter.github.io/pander/
Open Software License 3.0
294 stars 66 forks source link

don't remove existing hyphens when `use.hyphening` #311

Open salim-b opened 6 years ago

salim-b commented 6 years ago

Consider the following reprex:

library(dplyr)
library(pander)

data_frame(a = "This a hopefully _self-explanatory_ example of unduly removed hyphens.") %>% 
    pandoc.table(split.cells = 20, use.hyphening = TRUE)
#> 
#> -------------------
#>          a         
#> -------------------
#>  This a hopefully  
#>  _selfexplanatory_ 
#>  example of unduly 
#>  removed hyphens.  
#> -------------------

In the output table the hyphen of the word self-explanatory gets removed (because the parameter rm.hyph of koRpus::hyphen() is left at it's default value of TRUE).

I'm not familiar with the code and therefore didn't submit a pull request (yet). But I guess it would be enough to add the argument rm.hyph = FALSE to the following line of helpers.R: https://github.com/Rapporter/pander/blob/32e0f75ef359225a27aba3641fbe4fa84a5dc6d5/R/helpers.R#L404

What do you think? Alternatively, if you see any benefit/use case in having the hyphenator removing existing hyphens beforehand (I don't), an additional parameter could be introduced which passes the the option on to koRpus::hyphen.

daroczig commented 6 years ago

Good catch, thanks! I'd be open to this, good idea, but not sure if that works:

> koRpus::hyphen('foobar and foo self-explanatory ok', hyph.pattern = 'en.us', rm.hyph = TRUE)@hyphen[1, 2] 
Hyphenation (language: en.us)
[1] "foo-bar an-d f-oo s-el-fexpl-an-atory ok"
> koRpus::hyphen('foobar and foo self-explanatory ok', hyph.pattern = 'en.us', rm.hyph = FALSE)@hyphen[1, 2] 
Hyphenation (language: en.us)
[1] "foo-bar an-d f-oo s-el-fexpl-an-atory ok"

Any ideas?

salim-b commented 6 years ago

Any ideas?

Well, I'm not familiar at all with the koRpus package, but maybe I was wrong with the assumption that the parameter rm.hyph was responsible for the unduly removed hyphens.

Anyway, as far as I understand it, the hyphen() function expects a character vector of words, not sentences.

Consider this modification of your first example:

suppressPackageStartupMessages({
  library(dplyr)
  library(magrittr)
  library(stringr)
})

"foobar and foo self-explanatory ok" %>% str_split(pattern = " ") %>% unlist() %>% 
  koRpus::hyphen(hyph.pattern = "en.us", rm.hyph = TRUE, quiet = TRUE) %>% 
  slot("hyphen") %$% word
#> [1] "foo-bar"             "and"                 "foo"                
#> [4] "self-ex-plana-to-ry" "ok"

Now interestingly, if rm.hyph is set to FALSE, the hyphenation isn't correct anymore:

suppressPackageStartupMessages({
  library(dplyr)
  library(magrittr)
  library(stringr)
})

"foobar and foo self-explanatory ok" %>% str_split(pattern = " ") %>% unlist() %>% 
  koRpus::hyphen(hyph.pattern = "en.us", rm.hyph = FALSE, quiet = TRUE) %>% 
  slot("hyphen") %$% word
#> [1] "foo-bar"             "and"                 "foo"                
#> [4] "self-e-xplan-at-ory" "ok"

So it might have it's reason that the default value is TRUE... 😜

Do I get it right that you're currently feeding whole sentences to the hyphen() function in helpers.R? If so, splitting the sentences into words beforehand (and leaving rm.hyph at it's default value) might solve the issue.