ProjectSidewalk / SidewalkWebpage

Project Sidewalk web page
http://projectsidewalk.org
MIT License
84 stars 24 forks source link

Labels have tags for the wrong label type #3222

Closed misaugstad closed 1 year ago

misaugstad commented 1 year ago
Brief description of problem/feature

I just encountered a label in Seattle via LabelMap and noticed that it was a Surface Problem label with a 'parked car' tag, which shouldn't be possible. I did a quick analysis to get a better idea of the scope of the issue.

I ran a quick SQL query and found that in Seattle there are 726 tags that have been added that don't correspond to the label type of their label. That's 0.5% of the 142,633 tags in Seattle. I also found that 440 of the 726 messed up tags came from a single user, and the remaining 286 tags were spread over just 21 other users.

More investigation needed!

Here's the query to count the number of tags that are for the wrong label type. Also gets average label age.

SELECT COUNT(*), to_timestamp(avg(extract(epoch FROM time_created)))
FROM tag
INNER JOIN label_tag ON tag.tag_id = label_tag.tag_id
INNER JOIN label ON label_tag.label_id = label.label_id
WHERE tag.label_type_id <> label.label_type_id;

This one counts by user.

SELECT user_id, COUNT(*)
FROM tag
INNER JOIN label_tag ON tag.tag_id = label_tag.tag_id
INNER JOIN label ON label_tag.label_id = label.label_id
INNER JOIN audit_task ON label.audit_task_id = audit_task.audit_task_id
WHERE tag.label_type_id <> label.label_type_id
GROUP BY user_id
ORDER BY count DESC;
System info (OS, browser, city, and local/prod/test)

Haven't checked other servers, only checked in Seattle so far. It mostly happened in Chrome on Windows, but it happens on other OSs and browsers as well, and those two are already our most commonly used anyway.

misaugstad commented 1 year ago

Oh I forgot to add the part where I checked out the average age of the labels where this has happened. Average label with this issue was created end of Jan, 2023. Earliest one is from Nov, 2020, and most recent is from Apr 25 (three days ago).

misaugstad commented 1 year ago

I was wondering if this was a race condition, or if the tags for a label are actually being associated with the next/previous label...

I just looked at label 243580 in particular, which is a Pedestrian Signal label with the "shared pedestrian/car space" and "street has no sidewalks" tags (which are for No Sidewalk). But the labels directly before and after this label are Crosswalk labels!

In fact, label 243572 through 243581 all have the exact same tags and none of them are No Sidewalk labels!

Hopefully that's more info to on on when debugging next week.

misaugstad commented 1 year ago

OH and I am hoping that we can fix the data in the database using our logs, but I haven't looked into the logs yet.

misaugstad commented 1 year ago

I believe I know what's happening. I think that our system assumes that you are auditing on one computer at a time. I think that the issue here is a user will have the Explore page open on multiple computers, and alternate between auditing on each of them, without closing the Explore page between those sessions.

Some options for resolution:

  1. Think critically about what it means for a user to have the Explore page open in two different browsers, and make sure that our code is not assuming the opposite
  2. Add some detection on the back-end to notice when this issue is occurring, and have it tell the front-end to do a page refresh, which will mitigate nearly all of the damage
  3. Have a sort of "timeout" on the front-end, so that if the user goes back to the page again after a certain length of inactivity (30 minutes? an hour?), it forces a refresh straight away.

I lean towards 2/3, and maybe even both!

jonfroehlich commented 1 year ago

Interesting!

Such a theory makes me wonder how many of these cases were due to me?! 😭

misaugstad commented 1 year ago

I want to better summarize the damages and possible solutions here for the future.

Background

The first thing to understand is that you could sort the issues into two buckets. When multiple labels were given the same temporary_label_id, the first one that the back-end heard about was added to the database. All future labels with that same temporary_label_id were not added to the database; instead, the first label's metadata was overwritten with the newer label's metadata. So the first bucket are the initial labels added, which have the wrong metadata. The second bucket are the labels that were never added to the database at all.

Types of data that are messed up

I'll start by describing what data is messed up, ordered from least important to most important:

  1. If an open-ended description was provided for a label after the first, the description column was overwritten for the first label. I'm not super worried about fixing this because only 1.22% of all labels have a user-provided description, and the description was only overwritten if a description was added for a future label (with the same temp_label_id).
  2. Whether or not a label is "temporary" was overwritten. I don't think that this is super important to fix, because we haven't really made use of this data ever, and only 0.75% of all labels have been marked as temporary anyway.
  3. The severity rating rating for the first label was overwritten by future labels. I don't think it's deal breaker, but I consider severity much more important than the prior two pieces of data.
  4. The tags for the first label were overwritten by the last label with the temp_label_id. This is the worst piece of data to have overwritten, since you can have the tags for a completely different label type listed.
  5. Any labels after the first with the same temporary_label_id were never added to the database.

Quantifying the issues

Below is a table that shows the number of labels that have had their data overwritten and the number of labels that were never recorded due to this bug, per city. There are more labels never recorded than overwritten because you could open the Explore page in 3 locations and add a label in each of those, which would overwrite one label and never record the other two.

Ultimately, 0.36% of labels were never recorded and 0.33% of labels have their metadata messed up by this bug. I don't think that any city is missing more than 1% of their labels aside from Taipei, though they have only 602 labels to begin with.

City Overwritten labels Labels never recorded Labels in db
Seattle 1,130 1,328 248,233
Pittsburgh 140 140 18,980
CDMX 122 124 59,834
Chicago 119 122 26,201
Amsterdam 115 115 31,077
SPGG 104 106 112,077
La Piedad 101 101 4,663
Validation Study 55 55 7,867
Columbus 46 46 37,947
Oradell 31 48 13,400
Cuenca 20 20 16,808
Taipei 8 8 602
Newberg 6 6 18,365
Zurich 0 0 974
Total 1,997 2,128 597,028

Here's the query I used to calculate the table above:

SELECT COUNT(*) AS labels_with_overwritten_data, SUM(label_count) - COUNT(*) AS missing_labels
FROM (
    SELECT user_id, temporary_label_id, COUNT(*) AS label_count
    FROM audit_task_interaction
    INNER JOIN audit_task ON audit_task_interaction.audit_task_id = audit_task.audit_task_id
    WHERE action = 'LabelingCanvas_FinishLabeling'
    GROUP BY user_id, temporary_label_id
    HAVING COUNT(*) > 1
) x;

How I plan to address now

Here's how I'm planning to address each of these:

  1. description column: not addressing
  2. temporary column: not addressing
  3. severity column: I am at least partially fixing this by checking for the first severity rating that was added for the label. If the user changed their mind and updated the severity rating, this won't be included. But at least the severity rating for a completely different label won't be there!
  4. tags: I am partially dealing with this by simply deleting the associated tags for any impacted label.
  5. missing labels: not planning to address

Here is my query for fixing severity

UPDATE label
SET severity = severity_interactions.new_severity
FROM audit_task
INNER JOIN (
    SELECT user_id, temporary_label_id, COUNT(*) AS label_count
    FROM audit_task_interaction
    INNER JOIN audit_task ON audit_task_interaction.audit_task_id = audit_task.audit_task_id
    WHERE action = 'LabelingCanvas_FinishLabeling'
    GROUP BY user_id, temporary_label_id
    HAVING COUNT(*) > 1
) duplicated_ids
    ON audit_task.user_id = duplicated_ids.user_id
LEFT JOIN (
    SELECT DISTINCT ON (user_id, temporary_label_id) user_id, temporary_label_id,
           CAST(COALESCE(SUBSTRING(note FROM 'RadioValue:(\d)'), SUBSTRING(action FROM 'Severity_(\d)')) AS INT) AS new_severity
    FROM audit_task_interaction
    INNER JOIN audit_task ON audit_task_interaction.audit_task_id = audit_task.audit_task_id
    WHERE action = 'ContextMenu_RadioChange'
        OR action LIKE 'KeyboardShortcut_Severity__'
    ORDER BY user_id, temporary_label_id, timestamp
) severity_interactions
    ON audit_task.user_id = severity_interactions.user_id
WHERE audit_task.audit_task_id = label.audit_task_id
    AND label.temporary_label_id = duplicated_ids.temporary_label_id
    AND label.temporary_label_id = severity_interactions.temporary_label_id;

Here is my query for deleting tags

DELETE FROM label_tag
USING label
INNER JOIN audit_task ON label.audit_task_id = audit_task.audit_task_id
INNER JOIN (
    SELECT user_id, temporary_label_id
    FROM audit_task_interaction
    INNER JOIN audit_task ON audit_task_interaction.audit_task_id = audit_task.audit_task_id
    WHERE action = 'LabelingCanvas_FinishLabeling'
    GROUP BY user_id, temporary_label_id
    HAVING COUNT(*) > 1
) duplicated_ids
    ON label.temporary_label_id = duplicated_ids.temporary_label_id
    AND audit_task.user_id = duplicated_ids.user_id
WHERE label_tag.label_id = label.label_id;

How we could address in the future

When it comes to updating the description/temporary/severity/tags of labels that have been overwritten, I am fairly confident that we have all the data needed to do this. You'd likely want to write a Python script to help sift through the logs more effectively. You would start by getting all the logs relevant to your task (those pertaining to updating tags/severity/etc). You would get those associated with the given user_id and temporary_label_id, like I do in my queries above.

The difference is that you then need to figure out where logging for the first label ends and the next label begins. I am assuming that this would be done via timestamps, or maybe just using the timestamp of the logged creation of the next label as a cutoff. Then you would have to make sure the script can record added tags, removed tags, etc., so that you can find the final state of the tags for that label.

For labels that are completely missing, you will want to do essentially the same thing. You'll have to group the interaction logs based on timestamp and/or the log of when the user added the new labels. I am certain that this can be done for most metadata, like the label type, severity, timestamp, etc., basically everything in the label table. I'm less sure that you would be able to recover the data in the label_point table, like heading, pitch, and zoom. But I do think that it's likely that we are logging all of this!

Final takeaway

Though I believe that it's possible to recover the data, my take is that my time is better spent elsewhere. I wrote some fixes up until the point where there is a big jump in complexity to make further fixes.