biglocalnews / civic-scraper

Tools for downloading agendas, minutes and other documents produced by local government
https://civic-scraper.readthedocs.io
Other
42 stars 14 forks source link

Errors using command line, library to scrape without knowing the platform #175

Open stucka opened 9 months ago

stucka commented 9 months ago

In both Windows and Linux, using regular command line; command line through pipenv run civic-scraper, and in Jupyter using Runner: I get the same TypeError when trying to scrape a site of an unknown platform. As far as I know, there is no other method to trying to scrape a site of an unknown platform. This works for at least CivicPlus but I understand other platforms are supported.

The error is something like:

C:\data\agenda-watch-speedrun\scrapes>civic-scraper scrape --url https://www.roanokeva.gov/agendacenter
01-20 19:33 - civic_scraper.runner - Scraping 1 site(s) from 2024-01-20 to 2024-01-20...
Traceback (most recent call last):
  File "C:\Python\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Python\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Python\Scripts\civic-scraper.exe\__main__.py", line 7, in <module>
  File "C:\Python\lib\site-packages\click\core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "C:\Python\lib\site-packages\click\core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "C:\Python\lib\site-packages\click\core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "C:\Python\lib\site-packages\click\core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "C:\Python\lib\site-packages\click\core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "C:\Python\lib\site-packages\civic_scraper\cli.py", line 90, in scrape
    runner.scrape(**kwargs)
  File "C:\Python\lib\site-packages\civic_scraper\runner.py", line 67, in scrape
    SiteClass = self._get_site_class(url)
  File "C:\Python\lib\site-packages\civic_scraper\runner.py", line 97, in _get_site_class
    return getattr(mod, class_name)
TypeError: getattr(): attribute name must be string
joffemd commented 3 days ago

I am getting the same error scraping other sites including Palo Alto, CA. Here is the command string I used:

pipenv run civic-scraper scrape --download --start-date=2024-07-01 --url https://www.cityofpaloalto.org/Departments/City-Clerk/City-Meeting-Groups/Meeting-Agendas-and-Minutes