alirezamika / autoscraper

A Smart, Automatic, Fast and Lightweight Web Scraper for Python
MIT License
6.24k stars 654 forks source link

want_dict hanging? #38

Closed ws1088 closed 3 years ago

ws1088 commented 3 years ago

I tried to use wanted_dict in scraper.build() and it is hanging.

scraper.build(url, wanted_dict=wanted_dict)

If I use wanted_list is it fine. It is hanging here:

Traceback (most recent call last):
  File "test1.py", line 28, in <module>
    result = scraper.build(url, wanted_dict=wanted_dict, request_args=request_args)
  File "site-packages\autoscraper\auto_scraper.py", line 222, in build
    result, stack = self._get_result_for_child(child, soup, url)
  File "site-packages\autoscraper\auto_scraper.py", line 268, in _get_result_for_child
    result = self._get_result_with_stack(stack, soup, url, 1.0)
  File "site-packages\autoscraper\auto_scraper.py", line 326, in _get_result_with_stack
    getattr(i, 'child_index', 0)) for i in parents]
  File "site-packages\autoscraper\auto_scraper.py", line 326, in <listcomp>
    getattr(i, 'child_index', 0)) for i in parents]
  File "site-packages\bs4\element.py", line 1441, in __getattr__
    if len(tag) > 3 and tag.endswith('Tag'):
KeyboardInterrupt
ws1088 commented 3 years ago

it turns out i have to use a list in the dictionary value:

wanted_dict = {
    "a": ['1', '2', '3'],
    "b": ['4', '5', '6']
}

and cannot use a string:

wanted_dict = {
    "a": '1',
    "b": '2'
}

Please confirm and if so, please add checks to error out if the values in the dict is not a list. Thanks!

alirezamika commented 3 years ago

Yeah, the values are treated as iterables.

ws1088 commented 3 years ago

i think at least the API should not hang. maybe throw an exception or detect if it is not iterable it can do something with it still?