Open lyrixderaven opened 8 years ago
Still getting quite a few exceptions:
NFO:celery.redirected:2015-12-06 16:05:59 [scrapy] ERROR: Spider error processing <GET http://www.parlament.gv.at/PAKT/VHG/XXV/BI/BI_00003/index.shtml> (referer: None)
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 588, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/vagrant/offenesparlament/op_scraper/scraper/parlament/spiders/petitions.py", line 127, in parse
reference = self.parse_reference(response)
File "/vagrant/offenesparlament/op_scraper/scraper/parlament/spiders/petitions.py", line 462, in parse_reference
law__legislative_period=llp, law__parl_id=reference[1])
File "/usr/local/lib/python2.7/dist-packages/django/db/models/manager.py", line 127, in manager_method
return getattr(self.get_queryset(), name)(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/django/db/models/query.py", line 679, in filter
return self._filter_or_exclude(False, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/django/db/models/query.py", line 697, in _filter_or_exclude
clone.query.add_q(Q(*args, **kwargs))
File "/usr/local/lib/python2.7/dist-packages/django/db/models/sql/query.py", line 1301, in add_q
clause, require_inner = self._add_q(where_part, self.used_aliases)
File "/usr/local/lib/python2.7/dist-packages/django/db/models/sql/query.py", line 1328, in _add_q
current_negated=current_negated, connector=connector, allow_joins=allow_joins)
File "/usr/local/lib/python2.7/dist-packages/django/db/models/sql/query.py", line 1144, in build_filter
lookups, parts, reffed_aggregate = self.solve_lookup_type(arg)
File "/usr/local/lib/python2.7/dist-packages/django/db/models/sql/query.py", line 1030, in solve_lookup_type
_, field, _, lookup_parts = self.names_to_path(lookup_splitted, self.get_meta())
File "/usr/local/lib/python2.7/dist-packages/django/db/models/sql/query.py", line 1386, in names_to_path
"Choices are: %s" % (name, ", ".join(available)))
FieldError: Cannot resolve keyword 'law' into field. Choices are: _slug, category, category_id, creators, description, documents, id, keywords, law_ptr, law_ptr_id, laws, legislative_period, legislative_period_id, opinions, parl_id, petition_signatures, press_releases, redistribution, reference, reference_id, references, references_id, signable, signature_count, signing_url, source_link, status, steps, title
Found another one when scraping through the admin-scraper:
ERROR:celery.worker.job:Task op_scraper.tasks.scrape[2bf9be25-7142-4ddd-88a7-675dce4c370c] raised unexpected: TypeError('coercing to Unicode: need string or buffer, dict found',)
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/celery/app/trace.py", line 240, in trace_task
R = retval = fun(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/celery/app/trace.py", line 438, in __protected_call__
return self.run(*args, **kwargs)
File "/vagrant/offenesparlament/op_scraper/tasks.py", line 24, in scrape
process.start()
File "/usr/local/lib/python2.7/dist-packages/reversion/revisions.py", line 290, in __exit__
self._context_manager.end()
File "/usr/local/lib/python2.7/dist-packages/reversion/revisions.py", line 176, in end
in manager_context.items()
File "/usr/local/lib/python2.7/dist-packages/reversion/revisions.py", line 175, in <genexpr>
for obj, data
File "/usr/local/lib/python2.7/dist-packages/reversion/revisions.py", line 616, in <lambda>
version_data = lambda: adapter.get_version_data(instance, self._revision_context_manager._db)
File "/usr/local/lib/python2.7/dist-packages/reversion/revisions.py", line 109, in get_version_data
"object_repr": force_text(obj),
File "/usr/local/lib/python2.7/dist-packages/django/utils/encoding.py", line 92, in force_text
s = six.text_type(s)
TypeError: coercing to Unicode: need string or buffer, dict found
Seems to me that one of the object's representations (unicode or repr) returns a dictionary instaed of a string. Must be one of 'your' objects though, since this does not occur with other scrapers.
Please try and run your scraper from the admin-interface against an empty/pristine database. If everything worked, the petitions should have been saved. If you run your scraper that way and there are no petitions in the DB afterwards, check the logs (ignore/var/log/celery_worker.*
) for stacktraces that inhibit django reversions or the scraper itself to properly save the petitions.
just now found that. my pull request fixes the FieldError, looking into the second error
Are you sure the second error is only related to Petitions? I just scraped laws_initatives (or pre_laws) and had the same error at the end. Also it seems to be related to the kwargs of the scraper and not the individual scraped objects,
Currently, the petitions scraper still throws one or the other exception, for instance:
While it's ok that some things don't work out when scraping, we need to catch all exceptions, or otherwise the Django Reversion stop the database commits, and nothing that was scraped ends up saved.