goldsmith / Wikipedia

A Pythonic wrapper for the Wikipedia API
https://wikipedia.readthedocs.org/
MIT License
2.89k stars 520 forks source link

Questioning result of page - returns wrong page #199

Open vsoch opened 6 years ago

vsoch commented 6 years ago

I'm parsing a list of statistical methods, and one of them is "Lumpability" , specifically this page:

https://en.wikipedia.org/wiki/Lumpability

But when I use the wikipedia module, it seems to truncate the word and give me a list of (pretty wrong) alternatives:

from wikipedia import page
page(method)

DisambiguationError: "Lump" may refer to: 
"Lump" (song)
Lump (compilation album)
Lump (dog)
The Lump
Lump hammer
lumped capacitance model
Swelling (medical)
Globus pharyngis
mudbrick
Lump sum

I think this is probably a bug - it makes me question the other disambiguation errors I'm getting. Do you have a sense of what's going on?

vsoch commented 6 years ago

Okay I think I know the issue, and apologies I should have looked at the code before posting. It looks like you try to do some kind of auto suggesting with page and this is leading to the wrong page. Instead, I get the correct result if I do:

from wikipedia import WikipediaPage
WikipediaPage(method)

Although I need to try this on a DIsambiguation page - it could be that you have the page method to handle that.

vsoch commented 6 years ago

Nope looks like the Disambiguation error is still thrown (what I'd want to manually select the correct one). Could you comment on why the page() function is having this behavior?

martin-majlis commented 5 years ago

@vsoch : If you can use different library, you can try Wikipedia-API. It is returning only pages that you have specified.

import wikipediaapi
wiki = wikipediaapi.Wikipedia(language='en')

lumpability = wiki.page('Lumpability')
print(lumpability.summary)
print(lumpability.links)

Documentation for wikipedia API is here.

AlmasM commented 5 years ago

I am also wondering about the same issue. For example, for the the page "Web Bot"

w =wikipedia.page("Web bot")
w.title
<WikipediaPage 'Internet bot'>
output is ==> 'Internet bot'

Similarly:

w=wikipedia.page("web bot")
w.title
<WikipediaPage 'Internet bot'>
output is ==> 'Internet bot'

However:

w=wikipedia.page("web_bot")
<WikipediaPage 'Web Bot'>
w.title
output is ==> 'Web Bot'

I was also experimenting with "auto_suggest" parameter and after disabling it:

wikipedia.page("Web Bot", auto_suggest=False)
output is ==> 'Web Bot'

It seems that when auto_suggest is turned off, it returns the correct page. So, I was wondering what was the issue with "auto_suggest" parameter returning the wrong page.

Thanks,

Almas M.