Tomeriko96 / polyglotr

R package to translate text
https://tomeriko96.github.io/polyglotr/
Other
29 stars 2 forks source link

`google_translate` doesn't accept vectors of text as input #5

Closed gc5011 closed 1 year ago

gc5011 commented 1 year ago

Describe the bug google_translate() function is not vectorised. Passing a character vector into the function results in an error: Error in parse_url(url) : length(url) == 1 is not TRUE

Steps to reproduce the behavior:

library(polyglotr)
text_to_translate <- c("the", "quick", "brown")
google_translate (text_to_translate, "fr", "en")

Expected behavior

If supplying a vector or list to the function, it should parse each element and return a vector where each element is the translation of an element in the supplied vector.

Currently, it returns an error in some common use cases. Eg if trying to use dplyr::mutate() on a data frame to create a new column with the translation of an existing column the following code will return an error

df %>% 
mutate(translated_text = google_translate(original_text, "fr", "en"))

And to workaround you then need to use additional code, eg:

df %>% 
mutate(translated_text = map(original_text, google_translate, "fr", "en")) %>%
unnest_wider(translated_text) 

Desktop (please complete the following information):

Thanks for your work developing this package. It is handy and will likely be even easier to use if the functions can accept vectors of text to be translated.

Tomeriko96 commented 1 year ago

Thank you for submitting the issue! I will look into this soon.

In the meantime, is the method as described in this vignette viable for your use case?

See also the code below applied to your example:

text_to_translate <- c("the", "quick", "brown")
df <- data.frame(original_text = text_to_translate)

df %>%
  dplyr::mutate(translated_text = purrr::map_chr(original_text, ~ google_translate(.x, target_language = "fr", source_language = "en")))
Tomeriko96 commented 1 year ago

The function has now been vectorized, see commit abbaf0bfc9ba179c4438984e98b80f2d9e562398. Pushing updates to CRAN is planned for the end of July.

In the meantime, you can check it when using the development version on GitHub:

devtools::install_github("polyglotr")
library(polyglotr)

text_to_translate <- c("the", "quick", "brown")
google_translate (text_to_translate, "fr", "en")

>
[[1]]
[1] "le"

[[2]]
[1] "rapide"

[[3]]
[1] "brun"

And for the dataframe:

df <- as.data.frame(text_to_translate)

df %>% 
  mutate(translated_text = google_translate(text_to_translate, "fr", "en"))
>
  text_to_translate translated_text
1               the              le
2             quick          rapide
3             brown            brun
Tomeriko96 commented 1 year ago

Version 1.2.0 has been published to CRAN, which includes the fix for vectorization.