datamade / scrapers-us-municipal

Scrapers for US municipal governments.
MIT License
10 stars 8 forks source link

Private bills marked as duplicates if legislative session changes? #62

Open hancush opened 3 years ago

hancush commented 3 years ago

We started seeing duplicate item errors on a private bill that was previously imported into the database in the 2020 legislative session, but was more recently scraped as part of the 2018 legislative session.

We use the matter intro date to determine legislative session.

https://github.com/datamade/scrapers-us-municipal/blob/d8b92072def4301deb836922bdd4a9f8f2ff3a83/lametro/bills.py#L175-L186

I can see via authenticated request to the Legistar API that this bill was updated 4/12/2021, but I obviously can't see what was changed. I'll check with Metro to see whether the intro date was, in fact the information updated.

Meanwhile, I've removed the private bill with the incorrect legislative session from the databases. I expect that it will be rescraped and reimported correctly overnight, will update if that turns out not to be the case.

Stacktrace:

UniqueViolation: duplicate key value violates unique constraint "councilmatic_core_bill_slug_ecb9ca6b_uniq"
DETAIL:  Key (slug)=(2021-0211) already exists.

  File "django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
IntegrityError: duplicate key value violates unique constraint "councilmatic_core_bill_slug_ecb9ca6b_uniq"
DETAIL:  Key (slug)=(2021-0211) already exists.

  File "pupa/importers/base.py", line 291, in import_item
    obj = self.model_class.objects.create(**data)
  File "django/db/models/manager.py", line 82, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "django/db/models/query.py", line 422, in create
    obj.save(force_insert=True, using=self.db)
  File "django/db/models/base.py", line 744, in save
    force_update=force_update, update_fields=update_fields)
  File "django/db/models/base.py", line 793, in save_base
    update_fields=update_fields, raw=raw, using=using,
  File "django/dispatch/dispatcher.py", line 175, in send
    for receiver in self._live_receivers(sender)
  File "django/dispatch/dispatcher.py", line 175, in <listcomp>
    for receiver in self._live_receivers(sender)
  File "councilmatic_core/signals/handlers.py", line 76, in create_councilmatic_bill
    cb.save_base(raw=True)
  File "django/db/models/base.py", line 782, in save_base
    force_update, using, update_fields,
  File "django/db/models/base.py", line 873, in _save_table
    result = self._do_insert(cls._base_manager, using, fields, update_pk, raw)
  File "django/db/models/base.py", line 911, in _do_insert
    using=using, raw=raw)
  File "django/db/models/manager.py", line 82, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "django/db/models/query.py", line 1186, in _insert
    return query.get_compiler(using=using).execute_sql(return_id)
  File "django/db/models/sql/compiler.py", line 1377, in execute_sql
    cursor.execute(sql, params)
  File "raven/contrib/django/client.py", line 127, in execute
    return real_execute(self, sql, params)
  File "django/db/backends/utils.py", line 67, in execute
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
  File "django/db/backends/utils.py", line 76, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
  File "django/db/utils.py", line 89, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
DataImportError: duplicate key value violates unique constraint "councilmatic_core_bill_slug_ecb9ca6b_uniq"
DETAIL:  Key (slug)=(2021-0211) already exists.
 while importing {'identifier': '2021-0211', 'title': 'Restricted View', 'classification': ['bill'], 'subject': [], 'extras': {'restrict_view': True, 'plain_text': '', 'rtf_text': ''}, 'legislative_session_id': UUID('0d8323e4-b757-45b6-a61e-2f52da1bd59f'), 'from_organization_id': 'ocd-organization/46e47853-7856-4ae1-9901-8596e27f01bb'} as <class 'opencivicdata.legislative.models.bill.Bill'>
  File "bin/pupa", line 8, in <module>
    sys.exit(main())
  File "pupa/cli/__main__.py", line 68, in main
    subcommands[args.subcommand].handle(args, other)
  File "pupa/cli/commands/update.py", line 278, in handle
    return self.do_handle(args, other, juris)
  File "pupa/cli/commands/update.py", line 329, in do_handle
    report['import'] = self.do_import(juris, args)
  File "pupa/cli/commands/update.py", line 219, in do_import
    report.update(bill_importer.import_directory(datadir))
  File "pupa/importers/base.py", line 197, in import_directory
    return self.import_data(json_stream())
  File "pupa/importers/base.py", line 234, in import_data
    obj_id, what = self.import_item(data)
  File "pupa/importers/base.py", line 294, in import_item
    self.model_class))