Softcatala / LanguageToolAndroidService

Experiments with spell checking on Android
29 stars 14 forks source link

too many requests #3

Open danielnaber opened 8 years ago

danielnaber commented 8 years ago

I see in our log file that sometimes there are too many requests per minute from the Android speller. The current limit per IP is 20 request per minute. I'm not sure what happens then, e.g. whether these error replies are just ignored. Maybe it's possible to limit the number of requests to avoid the error.

jordimas commented 8 years ago

I'm assuming that you see more than 20 request per minute coming from different IP address.

In the LanguageTool Proofreader Preference's you can check the number of request made by the service.

Android sends to the spell checker services pieces of text. It is really difficult that you get this rate by real user if he does not copy and paste a very large piece of text, something

Two questions:

  1. What is the percentage of IP addresses causing this problem? Is it a small or big problem?
  2. One possibility is organizations using NAT (like in universities) where everybody shares the same IP address but they are really different users. Did you check where these IPs come from? Are they IPs assigned to organizations that may be sharing the same IP?

Regards

Jordi,

danielnaber commented 8 years ago

In March, we sent 4,738 error messages for 87 different IPs because of "too many requests" to the android service. Total requests from Android service were 275,868. I don't see anything special about the IPs, here are some examples:

http://www.iplocationtools.com/88.11.83.224.html http://www.iplocationtools.com/176.83.16.222.html http://www.iplocationtools.com/31.4.185.230.html

Most of them use language=es.

danielnaber commented 8 years ago

Here are some strange requests:

Is there maybe a color selector or so that also triggers the spell check? I'm just mentioning this, not sure if it's related to this issue.

jordimas commented 8 years ago

Regarding the odd text I do not know. It may be a color selector but I do know. I will have a look to see if there is any place where you can trigger this.

Regarding IP abuse they seem to be ASL connections. This may be people sharing connections.

What I can do is to generate an anonymous user ID per user per session and to attach this to the request as query string parameter. At least, we will be able to know if is the same user or different users. What do you think?

danielnaber commented 8 years ago

We could do this for debugging, yes. To really solve the problem, maybe I can increase the limit of requests for short texts.

jordimas commented 8 years ago

We are adding now (https://github.com/Softcatala/LanguageToolAndroidService/commit/c23571f4e4a19db3dbb998384880d8974bd63c3b) a sessionID parameter to the request URL. This number is generated random by user and show allow us to distingish between users. You should see some requests already between 16.00 and 17.10 today.

hozza commented 4 years ago

@danielnaber this seems a little disconcerting, I thought language tool servers did not record the text for privacy concerns, however you are referencing text submissions above, is this possible for all language tool servers or just for the Android app?

If this is just for the Android app, you should clearly mention on the play store that you have access to logs to everyone's text input. Doesn't seem safe or inline with the GDPR claims of language tools website.

jordimas commented 4 years ago

We are not recording any text for the Android App. It has the same policy and behaviour than the rest of the clients.

danielnaber commented 4 years ago

@hozza We store neither text nor the sessionID mentioned above.

hozza commented 4 years ago

Here are some strange requests:

  • text=FFFF5151 - a few hundred of those (with different hex values, seem to be color codes) from the same IP, user agent: Dalvik/2.1.0 (Linux; U; Android 6.0.1; Nexus 6 Build/MMB29U)
  • text=0D - hundreds of these and similar ones, User-Agent: Dalvik/2.1.0 (Linux; U; Android 6.0.1; A0001 Build/MMB29U)

Is there maybe a color selector or so that also triggers the spell check? I'm just mentioning this, not sure if it's related to this issue.

Thanks for the quick reply!

Some clarification here would probably help others too, as I imagine this privacy concern is why @findus23 requested the custom server URL in another issue, and also declined to submit the app to F-droid.

I realise this is an old thread, but super important clarification imo. 🤓

How were you able to identify the thousands of calls with the text of hex colour codes as shown above? Perhaps an older version of the app or server did unfortunately log text, but has since been updated to not store any user data or perhaps these logs are from a dev environment etc. If you could clarify this, preferably with commit references if possible, that would be super helpful!

If this can be clarified, I'd be happy to help with the project, I've noticed a fair few of the 1 star reviews on Google Play are due to a simple UX issue that could be fixed with tweaking the text the user first sees upon installing.

danielnaber commented 4 years ago

Sorry, this was 4 years ago, I really don't remember this issue. You can find our privacy policy at https://languagetoolplus.com/legal/privacy, it applies to languagetool.org and languagetoolplus.com. The Android app is not a project of the team/company that develops LanguageTool.

hozza commented 4 years ago

Ok thanks for clarifying, I imagine this is one for @jordimas then - perhaps I should open a separate issue for this?

If this app does not follow the same privacy principles as Language Tool in general then perhaps it should be rebranded or atleast clarified on the app description - otherwise it could affect the privacy GDPR status of the upstream Language Tool company.