goldsmith / Wikipedia

A Pythonic wrapper for the Wikipedia API
https://wikipedia.readthedocs.org/
MIT License
2.89k stars 518 forks source link

wikipedia.summary("Kale") searches for "Kal" [without 'e'] and returns DisambiguationError #295

Open ajz0ne opened 3 years ago

ajz0ne commented 3 years ago

wikipedia.summary("Kale") Traceback (most recent call last): File "", line 1, in File "/Users/aj/Library/Python/3.7/lib/python/site-packages/wikipedia/util.py", line 28, in call ret = self._cache[key] = self.fn(*args, **kwargs) File "/Users/aj/Library/Python/3.7/lib/python/site-packages/wikipedia/wikipedia.py", line 231, in summary page_info = page(title, auto_suggest=auto_suggest, redirect=redirect) File "/Users/aj/Library/Python/3.7/lib/python/site-packages/wikipedia/wikipedia.py", line 276, in page return WikipediaPage(title, redirect=redirect, preload=preload) File "/Users/aj/Library/Python/3.7/lib/python/site-packages/wikipedia/wikipedia.py", line 299, in init self.load(redirect=redirect, preload=preload) File "/Users/aj/Library/Python/3.7/lib/python/site-packages/wikipedia/wikipedia.py", line 393, in load raise DisambiguationError(getattr(self, 'title', page['title']), may_refer_to) wikipedia.exceptions.DisambiguationError: "kal" may refer to: Kal (name) Kevin Kallaugher Käl Kal, Fars Kal, Susan Kal, Kurdistan Kal Rural District Kal, Poland Kal, Hrastnik Kal, Ivančna Gorica Kal, Pivka Kal, Semič Kal, Tolmin Kal, Zagorje ob Savi Kal (band) Kal Online Kal (Doctor Who) Kal: Yesterday and Tomorrow Kal Tire Kerala Automobiles Limited Korala Associates Limited Korean Air Kaltag Airport Greenlandic language Kalamazoo Transportation Center Kadıköy Anadolu Lisesi Kurdish Academy of Language Kall (disambiguation) KALS (disambiguation) All Wikipedia pages beginning with Kal All Wikipedia pages beginning with Kal-e

itsbravestone commented 3 years ago

Just set the auto_suggest kwargs of wikipedia.summary to False to solve this issue :)

wikipedia.summary("Kale", auto_suggest=False)
>>> print(wikipedia.summary("Kale", auto_suggest=False))
Kale (), or leaf cabbage, belongs to a group of cabbage (Brassica oleracea) cult
ivars  grown for their edible leaves, although some are used as ornamentals. Kal
e plants have green or purple leaves, and the central leaves do not form a head
(as with headed cabbage). Kales are considered to be closer to wild cabbage than
 most of the many domesticated forms of Brassica oleracea.
Jehan commented 3 years ago

Ok so apparently this auto_suggest must be a new feature, and it should either be deactivated, or more likely fixed so that it doesn't suggest weird stuff when a page exist!

In [1]: import wikipedia                                                                                                                                      

In [2]: page = wikipedia.page('marmot')                                                                                                                       
---------------------------------------------------------------------------
PageError                                 Traceback (most recent call last)
<ipython-input-2-9a96c9d7cffb> in <module>
----> 1 page = wikipedia.page('marmot')

~/.local/lib/python3.9/site-packages/wikipedia/wikipedia.py in page(title, pageid, auto_suggest, redirect, preload)
    274         # if there is no suggestion or search results, the page doesn't exist
    275         raise PageError(title)
--> 276     return WikipediaPage(title, redirect=redirect, preload=preload)
    277   elif pageid is not None:
    278     return WikipediaPage(pageid=pageid, preload=preload)

~/.local/lib/python3.9/site-packages/wikipedia/wikipedia.py in __init__(self, title, pageid, redirect, preload, original_title)
    297       raise ValueError("Either a title or a pageid must be specified")
    298 
--> 299     self.__load(redirect=redirect, preload=preload)
    300 
    301     if preload:

~/.local/lib/python3.9/site-packages/wikipedia/wikipedia.py in __load(self, redirect, preload)
    343     if 'missing' in page:
    344       if hasattr(self, 'title'):
--> 345         raise PageError(self.title)
    346       else:
    347         raise PageError(pageid=self.pageid)

PageError: Page id "mar ot" does not match any pages. Try another id!

In [3]: page = wikipedia.page('Groundhog')                                                                                                                    
---------------------------------------------------------------------------
PageError                                 Traceback (most recent call last)
<ipython-input-3-bed2d68a3e94> in <module>
----> 1 page = wikipedia.page('Groundhog')

~/.local/lib/python3.9/site-packages/wikipedia/wikipedia.py in page(title, pageid, auto_suggest, redirect, preload)
    274         # if there is no suggestion or search results, the page doesn't exist
    275         raise PageError(title)
--> 276     return WikipediaPage(title, redirect=redirect, preload=preload)
    277   elif pageid is not None:
    278     return WikipediaPage(pageid=pageid, preload=preload)

~/.local/lib/python3.9/site-packages/wikipedia/wikipedia.py in __init__(self, title, pageid, redirect, preload, original_title)
    297       raise ValueError("Either a title or a pageid must be specified")
    298 
--> 299     self.__load(redirect=redirect, preload=preload)
    300 
    301     if preload:

~/.local/lib/python3.9/site-packages/wikipedia/wikipedia.py in __load(self, redirect, preload)
    343     if 'missing' in page:
    344       if hasattr(self, 'title'):
--> 345         raise PageError(self.title)
    346       else:
    347         raise PageError(pageid=self.pageid)

PageError: Page id "ground hug" does not match any pages. Try another id!

I mean, it's very funny that the API search for "ground hug" when I look up a "Groundhog", but it's a bit problematic. 😛

More seriously, I had a script which used to work very fine, then today I run it again after a few months and suddenly it breaks on half the pages I call it on so I had to look up what was happening. Setting auto_suggest=False to the wikipedia.page() calls fixes the calls.

I don't mind that this is set to True by default, but only if the suggestion only happens when the searched page doesn't exist.

varenc commented 2 years ago

I'm also seeing issues with this. It's pretty hilarious. Cat and Hat are swapped...

>>> wikipedia.summary("Cat")
u"A hat is a head covering which is worn for various reasons..."
>>>
>>>
>>> wikipedia.summary("Hat")
u'The cat (Felis catus) is a domestic species of small carnivorous mammal....'

Passing auto_suggest=False fixes it. Since this code hasn't changed in awhile, I assume this is Wikipedia's fault. Their "suggestion" feature over the API has gotten bad.