eldur / jwbf

Java Wiki Bot Framework is a library to maintain Wikis like Wikipedia based on MediaWiki.
http://jwbf.sourceforge.net/
Apache License 2.0
78 stars 33 forks source link

Nondistinct user agent #33

Closed Ironholds closed 10 years ago

Ironholds commented 10 years ago

The API etiquette guidelines require, at a bare minimum, either an email address or a URL to the project in the user agent.

eldur commented 10 years ago

Yes we know, so we recommend every user to do so. See existing documentation

Or do you think we have to improve something?

Ironholds commented 10 years ago

I'd strongly suggest adding a default user agent, or even automatically postfixing it - just "bot built with JWBF - https://github.com/eldur/jwbf" or something. Users can and will be stupid, or just lazy.

eldur commented 10 years ago

It's already implement as you described. Or is there a problem with the current implementation? -- See current testcases.

Ironholds commented 10 years ago

Unless my eyeballs have stopped working (quite possible - I write R and Python, not Java - it's not. I don't see any contact information or directions to this project there, just JWBF/version.

eldur commented 10 years ago

Okay I followed the example from User-Agent policiy Page

User-Agent: MyCoolTool/1.1 (http://example.com/MyCoolTool/; MyCoolTool@example.com) BasedOnSuperLib/1.4

and the accepted example from Pywikibot

User-Agent: login/g1234 Pywikibot/2.0 (User:Dexbot)

so a User-Agent may looks like

User-Agent: MyCoolTool/1.1 (http://example.com/MyCoolTool/; MyCoolTool@example.com) JWBF/release Apache-HttpClient/release (java 1.5)

I would suggest to change the User-Agent policiy Page and write an email to mediawiki-api from now on it is required to add an URL to the User-Agent part BasedOnSuperLib/1.4.

btw: Do you have an test curl where I can test if a useragent is blocked? -- I could imagine to use a definitely blocked user agent, and point users to the documentation page.
$ curl -s -A "lwp" "https://en.wikipedia.org/w/api.php" | head

fhocutt commented 10 years ago

Here is what Brad Jorsch had to say when I asked for clarification on the minimum required User-Agent (mediawiki-api list, July 31 2014):

I'd say at the least you'd want:

  • An identifier that isn't going to be confused with many other bots.
    • No spoofing browser agents!
    • No generic agents such as "curl", "lwp", "Python-urllib", and so on.
    • For large frameworks like pywikibot, there are so many users that just "pywikibot" is likely to be somewhat vague. Including detail about the specific task/script/etc would be a good idea, even if that detail is opaque to anyone besides the operator.
  • Some way to identify how to contact the operator, without relying on other headers in the request (e.g. the login cookies). This could be a reference to a userpage on the local wiki, a userpage on a related wiki using interwiki linking syntax, a URI for a relevant external website, an email address, etc.

@Ironholds, from the discussion on mediawiki-api it sounded like there was not consensus about what was strictly necessary beyond the above, particularly regarding larger frameworks where the framework maintainer may not even know that a runaway bot exists. Ultimately it comes down to the user (who, yes, may be tired, careless, or forgetful). I would welcome some more clarity on the issue.

Ironholds commented 10 years ago

Clarity would be good, yeah. Perhaps we should open a discussion?

@eldur we tend not to actively block user agents unless the UA is actually absent.

fhocutt commented 10 years ago

@Ironholds, the thread in question is here: https://lists.wikimedia.org/pipermail/mediawiki-api/2014-July/003304.html . Please feel free to raise the question again or re-open it!

On Tue, Oct 7, 2014 at 7:14 AM, Oliver Keyes notifications@github.com wrote:

Clarity would be good, yeah. Perhaps we should open a discussion?

@eldur https://github.com/eldur we tend not to actively block user agents unless the UA is actually absent.

— Reply to this email directly or view it on GitHub https://github.com/eldur/jwbf/issues/33#issuecomment-58190511.

eldur commented 10 years ago

I agree with @fhocutt′s suggestion. @Ironholds I tend to close this task, because we follow the documentation. Is this okay for you?

Ironholds commented 10 years ago

Sure! I'll just build a filter in for the generic agent.

eldur commented 10 years ago

Interesting .. do you have an url to your commit? :wink:

Ironholds commented 10 years ago

Well, not yet; these reports are the result of some exploratory hand-coding.

For the avoidance of doubt, I'm talking about our mechanisms for identifying non-automated edits and read requests. The library will be excluded from these datasets, not from making requests.

On Thursday, 9 October 2014, Loki notifications@github.com wrote:

Interesting .. do you have an url to your commit? [image: :wink:]

— Reply to this email directly or view it on GitHub https://github.com/eldur/jwbf/issues/33#issuecomment-58585393.

GUIpsp commented 10 years ago

I think an exception should be thrown if a user agent is not set for any wikimedia sites.

eldur commented 10 years ago

I would prefer: If a mediawiki installation requires a special format for user agents, it should reject unqualified requests with a 4xx status code and/or an informative error message.

Actual a warning was logged ..

a User-Agent must be set in your client

Ironholds commented 10 years ago

It does. The requirement is "a user agent", however ;p.

eldur commented 10 years ago

okay thank you all, if we get more details about the user agent format, open a new issue or reopen this one again.