ProjectSidewalk / SidewalkWebpage

Project Sidewalk web page
http://projectsidewalk.org
MIT License
84 stars 25 forks source link

Corrupted data from broken CSS zoom after Chrome browser update #3633

Closed misaugstad closed 3 months ago

misaugstad commented 3 months ago

Last week, Chrome rolled out a new version that changed their support for CSS-based zoom. We were relying on their earlier implementation to automatically zoom the Explore/Validate pages, and this was broken after the update (#3626). A hotfix was made (#3627) shortly thereafter to remove the automatic CSS zoom on Chrome.

@jonfroehlich brought up that we should be checking if any data was corrupted in the time between the new Chrome version rolling out and our hotfix. I did such an investigation today!

It seems that labels that were added while the CSS zoom was broken would be shown at an incorrect location on images in all of our downstream applications (in addition to showing up in the wrong place in Explore). The canvas_x and canvas_y are incorrect in the database, as are the pano_x and pano_y values that are derived from the former. Labels will show up in the wrong place in Validate, Gallery, in our automated crops, and in attempts to pin the labels on to full size panoramas.

The question is how much data was effected? I ran the query below in each city to determine how many labels were effected. I found that only 56 labels were effected, all of those being on our Teaneck server, making up 0.3% of the labels on that server. Given how little data was effected, it's not worth the time investment to figure out how to back-calculate what the correct values would have been and fix them in the database (if this is even possible).

SELECT COUNT(DISTINCT(label_id))
FROM label
INNER JOIN audit_task_environment ON label.audit_task_id = audit_task_environment.audit_task_id
WHERE label.deleted = FALSE
    AND label.tutorial = FALSE
    AND browser = 'chrome' AND browser_version = '128.0.0.0'
    AND label.time_created < (SELECT version_start_time FROM version WHERE version_id = '7.20.4');

I think that the best course of action would be to mark all effected labels as "deleted" and mark the associated streets as unaudited so that another user will go over those streets again. This removes the corrupted data from our system and I don't think that replacing the 56 labels is that big of a time investment for our users. It's sad that any data is lost, but I think that we caught it early enough that the amount of lost data is close to trivial!

misaugstad commented 3 months ago

And this is now done!

Used this query to mark the streets for re-audit:

UPDATE audit_task
SET incomplete = TRUE
FROM label
INNER JOIN audit_task_environment ON label.audit_task_id = audit_task_environment.audit_task_id
WHERE audit_task.audit_task_id = label.audit_task_id
    AND label.deleted = FALSE
    AND label.tutorial = FALSE
    AND browser = 'chrome' AND browser_version = '128.0.0.0'
    AND label.time_created < (SELECT version_start_time FROM version WHERE version_id = '7.20.4');

And used this query to then mark the labels as "deleted":

UPDATE label
SET deleted = TRUE
FROM audit_task_environment
WHERE label.audit_task_id = audit_task_environment.audit_task_id
    AND label.deleted = FALSE
    AND label.tutorial = FALSE
    AND browser = 'chrome' AND browser_version = '128.0.0.0'
    AND label.time_created < (SELECT version_start_time FROM version WHERE version_id = '7.20.4');