datamade / court-scrapers

MIT License
2 stars 0 forks source link

Rescrape script is updating all unchaged cases with incorrect data #50

Closed antidipyramid closed 3 months ago

antidipyramid commented 3 months ago

When we find that any case has been updated since the last scrape, the script to update rescraped cases is inserting the wrong data for unchanged cases.

This section is causing the problem:

UPDATE cases.court_case
SET
    calendar = r.calendar,
    filing_date = r.filing_date,
    division = r.division,
    case_type = r.case_type,
    ad_damnum = r.ad_damnum,
    court = r.court,
    hash = r.hash,
    scraped_at = CURRENT_TIMESTAMP,
    updated_at = CURRENT_TIMESTAMP
FROM court_case as r
WHERE
    court_case.case_number IN (SELECT * FROM updated_case);

This results in court_case looking like this:

Screenshot 2024-04-03 at 1 46 36 PM

After the fix is in, we'll have to do a major manual rescrape of civil and chancery cases to fix the errors.