anupam409 / google-api-translate-java

Automatically exported from code.google.com/p/google-api-translate-java
0 stars 0 forks source link

Not usable for automatic translations #5

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Run the sample program in a loop to simulate bulk translations.
2.
3.

What is the expected output? What do you see instead?
Expected: An infinite number of translations
Actual Result: 
java.lang.StringIndexOutOfBoundsException: String index out of range: -1
        at java.lang.String.substring(Unknown Source)
        at java.lang.String.substring(Unknown Source)
        at com.google.api.translate.Translate.translate(Translate.java:44)
        at Test.main(Test.java:7)

What version of the product are you using? On what operating system?
google-api-translate-java-0.23.jar on Windows XP and Linux (OS doesn't matter)

Please provide any additional information below.
The problem is that after a large number of requests, Google's web site
freaks out and you get a "403 Forbidden" page which says:

"We're sorry...

... but your query looks similar to automated requests from a computer
virus or spyware application. To protect our users, we can't process your
request right now.

We'll restore your access as quickly as possible, so try again soon. In the
meantime, if you suspect that your computer or network has been infected,
you might want to run a virus checker or spyware remover to make sure that
your systems are free of viruses and other spurious software.

We apologize for the inconvenience, and hope we'll see you again on Google."

I can enter the captcha code and resume, but since it seems to set a cookie
in my browser, my automated translator is broken until Google's web servers
"trust" my IP address again.

Original issue reported on code.google.com by panu...@gmail.com on 20 Nov 2007 at 8:59

Attachments:

GoogleCodeExporter commented 9 years ago
Thanks, I'll take a look.

Original comment by rich.mid...@gmail.com on 21 Nov 2007 at 3:06

GoogleCodeExporter commented 9 years ago
I get the same problem. A workaround would be to add Thread.sleep(x), so that 
at most
y requests are sent in a minute. The question is: how many requests per minute 
are
allowed? How does the service decide it is too much?

Original comment by thomas.t...@gmail.com on 24 Nov 2007 at 12:45

GoogleCodeExporter commented 9 years ago
I guess we could let users do that, or perhaps add a translate(List<String>, 
String,
String) method which would do the same thing. Also, it looks to me like a 
request
every second triggers it, but a request every 2 doesn't. Two seconds is an 
awfully
long time to wait though if anyone wants a lot of different bits of text 
translated.

We could also throw a particular error containing the captcha image and add a 
method
to take the captcha string, submit it and keep the cookie. That might be a bit
awkward for others to catch and handle though, but it does give people an 
option.

What does anyone else think?

Original comment by rich.mid...@gmail.com on 24 Nov 2007 at 4:50

GoogleCodeExporter commented 9 years ago
> perhaps add a translate(List<String>, String, String)
I like this. Internally it could convert the list into one (large) request by 
adding
'untranslatable' separators (for example large numbers), so that the result can 
be
split as well there.

> a request every 2 doesn't
For me it's OK if the translate API can only process one request every two 
seconds -
the API could automatically sleep as much as required. As long as there is a 
way to
'bulk translate'.

Original comment by thomas.t...@gmail.com on 25 Nov 2007 at 12:12

GoogleCodeExporter commented 9 years ago
That'll work, but I expect other people will encounter the same problem by 
invoking
translate(String, String, String) in a loop too many times, and then they'll 
have to
wait for the web site to "cool down" before allowing their requests again.

Another thought is the API can keep track of the number of translation-requests 
per
second, and when the user's program starts doing "too many" translations, the 
API
would throttle back the requests automatically. That'll allow some programs to
experience no slow-down when performing "small quantities" translations, and 
will
allow bulk translation programs to run without requiring any additional coding.

Original comment by panu...@gmail.com on 25 Nov 2007 at 6:38

GoogleCodeExporter commented 9 years ago
I've added throttling to subversion for this which should appear in the next 
build.
Throwing captcha errors looks more difficult to handle - for some reason the 
captcha
image locations don't appear in the HTML downloaded.

Original comment by rich.mid...@gmail.com on 1 Dec 2007 at 1:01

GoogleCodeExporter commented 9 years ago
The throttle looks good, just one question - in 'retrieveTranslation', it opens 
the
HttpURLConnection and retrieves the InputStream (which establishes the network
connection to google.com), then it sleeps (in the 'toString' method), then it 
reads
the response from the InputStream. Seems like the network IO would be a tad more
efficient to sleep before opening the HttpURLConnection. But that's pretty 
minor.

Nice work Rich - thanks for taking to time to add in the throttle!

Original comment by panu...@gmail.com on 3 Jan 2008 at 1:19

GoogleCodeExporter commented 9 years ago
Then again, after 242 translations the web site freaked out and blocked me. I 
just
ran a sample program that translated "Hello" from English to Russian 1000 times
(using the throttle), and after 242 translations, the site blocked my IP 
address.

Another thought would be to go back to the idea of translate(List<String>, 
String,
String) and send up a batch of Strings separated by new-lines, then parse the
returned contents of "result_box" and split it using "<br>". I just tried it
manually, and was able to translate a batch of 25 words via just one HTTP 
request. 

Original comment by panu...@gmail.com on 3 Jan 2008 at 1:37

GoogleCodeExporter commented 9 years ago
Solved Problem: 
Hey, I did run to that problem too. The reason you run to it is because you are
overwhelming the server by doing it like this. I think google implements back-up
algorithm  to handle such thing. 

Here how i solved the problem. I did a loop like you did except that i make the
program sleeps 5 seconds after each translation, 10 second after 10 
translations and
5 minutes after 200 translations which works fine for me. You should do 
something
similar. It does take a little more time, but who cares ha. 

Original comment by almojor@gmail.com on 10 Mar 2008 at 5:28

GoogleCodeExporter commented 9 years ago
I ran into this problem as well :(. I need to translate a bunch of XLS files 
from
Japanese to English. I tried doing it cell by cell (and it worked beautifully 
for the
first document) but Google soon blocked me. I'm now thinking the solution is to
compile the cells into one large string to be translated and then parse the
translation back into the correct parts of the document. The thing I want to 
know is,
what is the largest string you can send to be translated? Is there a known 
limit?

Thanks in advance.

Original comment by DragonWi...@gmail.com on 25 Mar 2008 at 10:04

GoogleCodeExporter commented 9 years ago
I don't know what the limit is, but I believe it's quite long. (> 2k 
characters?)

Original comment by rich.mid...@gmail.com on 26 Mar 2008 at 8:16

GoogleCodeExporter commented 9 years ago
Also, if you do find the limit please let me know! Thanks.

Original comment by rich.mid...@gmail.com on 26 Mar 2008 at 8:16

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Is the solution almojor provides working or is google still blocking their 
service
after some time?

Original comment by daniel.j...@gmail.com on 11 Apr 2008 at 11:19

GoogleCodeExporter commented 9 years ago
I discovered that sending the entire document as one long string doesn't seem to
work, but segmenting it down to about 500 characters *mostly* works. For some 
reason,
about 5% of my concatenated strings return an error (always returns an error 
for the
same string) but if I send each cell individually (and use a conservative delay 
of 30
sec) it works just fine. I'm trying to isolate the exact sub-string that causes 
the
crash but so far it has eluded me. 

Original comment by DragonWi...@gmail.com on 11 Apr 2008 at 4:21

GoogleCodeExporter commented 9 years ago
Oh yeah. The almojor solution works for bulk translations. Just 5 seconds delay
between every request, at the 10th request 10 seconds and at the 200th request 5
minutes. I don't know if this is the most optimal but it works. 

Btw my articles are on average about 4000 characters long and that works fine. 
I am
not using the Java solution but my own implementation in PHP. Yes in php 
because I
want to run it on a cheap hosting server with a cron tab.

Original comment by daniel.j...@gmail.com on 11 Apr 2008 at 8:26

Attachments:

GoogleCodeExporter commented 9 years ago
I believe this issue with repeated translations should be fixed as of version 
0.4.

Original comment by rich.mid...@gmail.com on 11 May 2008 at 12:27