ChristopherLucas / translateR

R Package for Cross-Language Topic Modeling
20 stars 17 forks source link

Google translate returns NULL for more than 1 row of dataframe #10

Closed alanault closed 8 years ago

alanault commented 8 years ago

Hi Christopher,

I've been using the package (which is super, useful BTW!) and had it working fine with Microsoft. I've added the API details for Google and run into a problem.

A call works fine manually (if I hand build the call) and fine for just a dataframe with a single row, but if more than one row, just returns a NULL?

Looking at the code for translate.r, I wondered if the code which glues together the results was missing as the Google section seems to miss this, whereas the Microsoft version has this?

Any thoughts?

many thanks

Alan

ChristopherLucas commented 8 years ago

Glad you're finding it useful!

So I can make sure I'm addressing the right problem, can you post a simple example (without your api key, of course)? It should be easy to fix given that.

Thanks again!

alanault commented 8 years ago

Thanks for the super-fast response!

Here is an example, using your Enron data set to make it a reproducible as possible

library (translateR)
api <- {API key goes here}
data (enron)

# This works
translate (enron [1,], 
 content.field = "email", 
 google.api.key = api, 
 source.lang = "en", 
 target.lang = "de")

# This doesn't work
translate (enron, 
 content.field = "email", 
 google.api.key = api, 
 source.lang = "en", 
 target.lang = "de")

The working example, returns the original enron DF + the translatedContent field (holding the results). The example which doesn't work, just returns the original dataframe, with no translatedContent field.

Thanks for your help!

Best

Alan

ChristopherLucas commented 8 years ago

Oddly enough, this all works fine for me. My session info is below. The call to the Microsoft API uses httr whereas that to Google uses RJSONIO and RCurl. Which versions of these packages do you have installed?

sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 15.10

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] translateR_1.0

loaded via a namespace (and not attached):
 [1] httr_1.1.0     R6_2.1.2       tools_3.2.2    parallel_3.2.2 RCurl_1.95-4.8
 [6] slam_0.1-32    RJSONIO_1.3-0  tau_0.0-18     textcat_1.0-4  bitops_1.0-6  
alanault commented 8 years ago

That really is odd!

Seems we’re got the same versions of everything installed, other than I’ve got a slightly newer version of base R. Interestingly, I noticed I also get a lot of errors from the Microsoft API as well: Argument ExceptionMethod: Translate()Parameter: toMessage: 'to' must be a valid language

Any suggestions as to what I can try next?

My session info:

R version 3.2.3 (2015-12-10) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: OS X 10.11.4 (El Capitan)

locale: [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] translateR_1.0

loaded via a namespace (and not attached): [1] httr_1.1.0 R6_2.1.2 parallel_3.2.3 tools_3.2.3 RCurl_1.95-4.8 slam_0.1-32
[7] RJSONIO_1.3-0 tau_0.0-18 textcat_1.0-4 bitops_1.0-6

On 3 May 2016, at 16:54, Christopher Lucas notifications@github.com wrote:

Oddly enough, this all works fine for me. My session info is as follows. The call to the Microsoft API uses httr whereas that to Google uses RJSONIO and RCurl. Which versions of these packages do you have installed?

sessionInfo() R version 3.2.2 (2015-08-14) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 15.10

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] translateR_1.0

loaded via a namespace (and not attached): [1] httr_1.1.0 R6_2.1.2 tools_3.2.2 parallel_3.2.2 RCurl_1.95-4.8 [6] slam_0.1-32 RJSONIO_1.3-0 tau_0.0-18 textcat_1.0-4 bitops_1.0-6
— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/ChristopherLucas/translateR/issues/10#issuecomment-216574317

alanault commented 8 years ago

Found the root of the random errors from the MIcrosoft API - they seem to be when a comment string has a # in it, typically a hashtag (I'm translating tweets).

Losing the # results in no errors.

alanault commented 8 years ago

Hi Chris,

Managed to work out what the problem was. Went through the code and the issue was the line of code where you convert the JSON output to a R object using fromJSON.

When I ran the line manually, I got an R error where it didn't like the vectorised input (only first element was used) - this was why one entry worked.

In my script, I'm also loading other packages (yaml, dplyr, readxl and stringr) - it occured to be that one of these might use jsonlite, which has it's own fromJSON function which might be called instead.

Moved the library call to transateR to the end of my list and voila - Google works.

So, the issue was that the fromJSON call can get confused and works differently with other packages floating around. Maybe add the RJSONIO::fromJSON to the code, just in case people have jsonlite installed. I think httr and other packages are moving to this, so could become more of an issue in future.

Many thanks for the help and the package!

Best

Alan