CouncilDataProject / cdp-scrapers

Scratchpad for scraper development and general utilities.
https://councildataproject.org/cdp-scrapers
Mozilla Public License 2.0
24 stars 16 forks source link

Clerks are entering duplicate Legistar MatterSponsors #74

Open dphoria opened 2 years ago

dphoria commented 2 years ago

Describe the Bug

A clear and concise description of the bug.

import requests
resp = requests.get("http://webapi.legistar.com/v1/kingcounty/Matters/22552/Sponsors")
print(resp.status_code)
for sponsor in sponsors:
    print(sponsor["MatterSponsorName"])

# Joe McDermott
# Jeanne Kohl-Welles
# Jeanne Kohl-Welles
# Joe McDermott

Expected Behavior

What did you expect to happen instead?

import requests
resp = requests.get("http://webapi.legistar.com/v1/kingcounty/Matters/22552/Sponsors")
print(resp.status_code)
for sponsor in sponsors:
    print(sponsor["MatterSponsorName"])

# Joe McDermott
# Jeanne Kohl-Welles

Reproduction

Steps to reproduce the behavior and/or a minimal example that exhibits the behavior.

Given above

Possible solutions

Remove duplicates after querying MatterSponsors from Legistar API.

evamaxfield commented 2 years ago

Agree we should solve this on the data ingestion side. But maybe leave this for someone's first contribution to CDP.

CDP data ingestion should solve this naturally because primary keys already cover.

evamaxfield commented 2 years ago

By that I mean I think the primary keys / the keys we use for hashing are person id and matter id

dphoria commented 2 years ago

Agree we should solve this on the data ingestion side. But maybe leave this for someone's first contribution to CDP.

Was thinking the same; didn't assign to me yet LOL. I want to really dive into meta data API ASAP.