goldsmith / Wikipedia

A Pythonic wrapper for the Wikipedia API
https://wikipedia.readthedocs.org/
MIT License
2.87k stars 519 forks source link

Fetches wrong page (ignores capitalization) #85

Open seewhydee opened 9 years ago

seewhydee commented 9 years ago

import wikipedia p = wikipedia.page("Red dwarf") p.title 'Red Dwarf'

This fetches the page https://en.wikipedia.org/wiki/Red_Dwarf, not https://en.wikipedia.org/wiki/Red_dwarf

Spaxe commented 9 years ago

+1

The case-insensitive search is really annoying as well. What's the rationale behind the design?

geekpradd commented 9 years ago

There are many issues in this module ranging from Python 3 incompatibility to the above usage problems. Since the core developer, Goldsmith appears to have taken leave from GitHub and PyPi (his last commit was on November) and there many standing issues, I'm thinking of creating a fork that will solve problems of this module, remain separate from this and add on more features (kinda like how LibreOffice forked off from OpenOffice)

More details coming soon.. I'll try to fix the above error.

geekpradd commented 9 years ago

Here is a simple solution: Add a parameter auto_suggest=False to the call for wikipedia.page. In this way, the module won't automatically change "Red dwarf" to "Red Dwarf" (auto_suggest=True uses Wikipedia's API search method to get correct urls)

Demo usage:

import wikipedia
p = wikipedia.page("Red dwarf", auto_suggest=False)
print (p.title) #This will be Red dwarf
Spaxe commented 9 years ago

@geekpradd: I've done something very similar. In another case, I had to append (video game) after Omega to get to the correct page, because the default is the Greek letter, instead of a DisambiguationError. The solution was using wikipedia.WikipediaPage and replacing `with_` in all my search strings.

geekpradd commented 9 years ago

Yup, that should work because of automatic wikipedia redirects, The thing is, this module requires some extensive tuning especially in the redirect parts and charmap errors,, The original dev should try to fix these errors.

goldsmith commented 9 years ago

Thanks for the report and sorry for the delay -- I've been busy with school and just have a chance now to get caught up with wikipedia development.

I think this might just be an issue of defaults. Currently, as @geekpradd noted, the auto_suggest parameter defaults to True for the page method. Auto suggest was originally built in to match the behavior of the actual search bar on the Wikipedia site, which corrects page titles to their common equivalent for the correct page. However, a power user (or one with a specific page in mind) should definitely feel free to turn the auto_suggest flag off and pass in a particular title, e.g.

>>> print wikipedia.page("Omega (video game)", auto_suggest=False).summary
Omega is a computer game developed by Stuart Marks and published by Origin Systems in 1989. The original game came on 5¼" floppy disks.

I'm not sure that there's actually an issue here as long as the user is deliberate about their use of convenience vs lower level wikipedia functionality?

Spaxe commented 9 years ago

@goldsmith The issue here is mismatch of expectancy, not a functionality issue. Programming languages tend to be case-sensitive. It might be too late to change the API behaviour now.

May I suggest a highlight in the documentation? Something like:

By default, wikipedia.page() search is case-insensitive, which matches Wikipedia's behaviour. If you want to have case-sensitive search, set the argument auto_suggest=False in your call. Example:

>>> print wikipedia.page("Omega (video game)", auto_suggest=False).summary
Omega is a computer game developed by Stuart Marks and published by Origin Systems in 1989. The original game came on 5¼" floppy disks.

could avoid future users from working through the same loops.