Open misaugstad opened 3 years ago
Thanks Mikey. Yah, wonder if this is just one individual's user labeling idiosyncrasy or a tech issue (e.g., Safari doing something weird again).
On Tue, Mar 2, 2021 at 11:56 AM Mikey Saugstad notifications@github.com wrote:
I suspect that this is just coming from user behavior (maybe they want to avoid covering up the thing that they are evaluating), but we haven't ruled out that it is a bug of ours yet. I think I want to collect a bunch of examples of this and then check to see if there are any patters in the browser, OS, or user(s) who are contributing data that look like this. That should inform how we deal with this issue. Here are a few examples from a quick look through Gallery. But I've seen much more egregious examples recently.
[image: Screenshot from 2021-03-02 11-49-24] https://user-images.githubusercontent.com/6518824/109706273-97897b80-7b4d-11eb-8015-cf62e7343b65.png [image: Screenshot from 2021-03-02 11-48-59] https://user-images.githubusercontent.com/6518824/109706278-98221200-7b4d-11eb-9c3b-26b7cd81c540.png [image: Screenshot from 2021-03-02 11-46-46] https://user-images.githubusercontent.com/6518824/109706279-98baa880-7b4d-11eb-87fa-9374e657f654.png [image: Screenshot from 2021-03-02 11-45-27] https://user-images.githubusercontent.com/6518824/109706280-98baa880-7b4d-11eb-8e2b-67a68ba3f210.png
Label IDs: 9067, 29691, 49109, 3367 in SPGG.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ProjectSidewalk/SidewalkWebpage/issues/2478, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAML55PZHFSLJXNUZIXIW7DTBU7GDANCNFSM4YPUMTGA .
-- Jon E. Froehlich https://jonfroehlich.github.io/ (he/him https://www.mypronouns.org/he-him) | @jonfroehlich https://twitter.com/jonfroehlich Associate Professor, Allen School of Computer Science & Engineering Core Faculty, Urban Design and Planning, College of Built Environments Director, Makeability Lab https://makeabilitylab.cs.washington.edu/ | Associate Director, CREATE http://create.uw.edu/ University of Washington Help make sidewalks more accessible: http://projectsidewalk.io
Seattle Label ID's
SPGG Label ID's
Hi @misaugstad,
I ran the cropping tool on about ~2,500 dropped curb labels. I did a manual review of the cropped images and noted down a sample of label_ids where I felt the labels were poorly positioned (floating too high, sometimes too low) as in your above examples.
The list comprises 100 labels, hopefully this is a decent enough sample size for you to be able dig a little deeper into the issue.
Sample - 20136
If you need more information, or want a larger sample size, let me know and that's something I can work on.
This is incredibly helpful, thank you! I expect this to be a large enough sample size. I'll get back to you when I have more info!
I'm probably not going to be doing this analysis today, but note to self for the future. Probably using a query like this:
SELECT label_id, sidewalk_user.user_id, username,
browser, browser_version, browser_width, browser_height, avail_width, avail_height,
screen_width, screen_height, operating_system, language
FROM label
INNER JOIN audit_task ON label.audit_task_id = audit_task.audit_task_id
INNER JOIN audit_task_environment ON audit_task.audit_task_id = audit_task_environment.audit_task_id
INNER JOIN sidewalk_user ON audit_task.user_id = sidewalk_user.user_id
WHERE label_id IN ()
GROUP BY label_id, sidewalk_user.user_id, username,
browser, browser_version, browser_width, browser_height, avail_width, avail_height,
screen_width, screen_height, operating_system, language;
Here's a video of the problem @uditpatwal and @misaugstad
https://user-images.githubusercontent.com/1621749/138569837-fbc67b83-eb7d-4f22-83f9-2d0ef4e89b69.mp4
More examples:
Another recent example from St. Louis
Found another extreme example of this. In Seattle, user 38f27bb3-eb84-44ea-8284-780f1675d621. Looking at their labels placed on May 6th, 2024. What's interesting about them is that they seem to be off in the horizontal direction as well as the vertical, when I most commonly see the labels off in the vertical direction only.
I tried to label in the same places on the same panos and it worked fine on my computer, Linux, Chrome.
A quick look at the audit_task_environment
table shows me
I also tried doing this, shortening my browser like they did and still got good results from my labels... But again this was on Linux.
Another weird piece is that it looks like they were auditing outside of where they needed to be (pic below)... It's definitely possible that this was just user error, they are a new HS volunteer. But it's also possible that it could be related?
This happened to the data that I added to Walla Walla when first setting up the city. I recorded a few pano IDs where I found that my labels were placed in the wrong spot: ZaS_D67N2iAEgEEcMcFnzw 2Nn0aR2CfUhZ9oTYORAADg
Huge breakthrough!! I've found a big reason why labels aren't placed where they are expected! The issue has to do with the zoom level of GSV being set to a non-integer value (it would typically be 1.9999 or 2.9999 from what I've seen). It seems that when sending the zoom data to the back end, it's automatically being converted to an integer by truncating from 1.999 to 1 or 2.999 to 2.
Here's an example where I place a label after manually setting zoom to 3 and then setting it to 2.999, then viewing it on the Admin page. First screenshot from the Explore page, 2nd one on Admin page for the zoom level 3, and then third for zoom level 2.999.
All the data in the db looks identical for those two labels, except the one I added at zoom level 2.999 has it's zoom level set to 2 instead of 3.
Another example! From earlier in this thread, I noted some cases I found in Walla Walla from my own labeling where the labels were clearly not where I had placed them (I noticed this from auditing and then looking at them in Gallery shortly after). First pic is what it looks like on the admin page, second pick is what it looks like there after I manually changed the zoom level in the db from 1 to 2.
So we should be able to fix this issue going forward by adding a bunch of additional checks to ensure that the zoom level is actually set to an integer before we send anything to the back end. In fact, anything that requires an int to be sent back, we should be calling Math.round()
on it just to be safe since I don't believe that we have control over how integer parsing happens.
Note that this does not effect the automated cropping that @hoominchu added, since that just takes a screenshot of the canvas as it appears to the user, and then it uses the canvas x/y coordinates which are correct. And the pano x/y values in the db are identical between labels with the zoom set correctly or not, so this won't help with crops created from full panoramas either. This would just fix how we show labels on the PS website; though I don't remember all the ways that we've chosen to create image crops for CV, and maybe some of them used the zoom level..?
The bad news: we have no way of programmatically knowing where this error occurred in the past to be able to fix it. I've looked through the logs, and the zoom level is always stored as an integer, we don't have it written out in text somewhere to be able to pull from. And there isn't a correctly rounded zoom stored somewhere that we can use instead.
We can certainly discuss methods of manually fixing the zoom level for labels that we've already collected that we notice to be offset for this reason. I could imagine some admin tools we might want. And in the new validate page, I added an "unsure" reason called "Label placement looks incorrect", and we could pull from this list to manually review labels and fix them :eyes: We probably also have lists of labels that look off the mark from our previous attempts at training CV and documenting these errors.
I'll start by working on cleaning up the zoom level as it comes in going forward! @srihariKrishnaswamy did start to do this in #3660! Though that only fixes the zoom level when someone tries to zoom in/out! I think that if the zoom level starts out incorrect and they never try to zoom, the zoom level would stay incorrect so this is still necessary!
Oh you know what labels we can fix though? Labels where the zoom level is set to 0 in the db! We enforce a zoom level of 1, 2, or 3 on the explore page. If zoom is set to 0 in the db, we can safely assume that the true zoom level was ~0.999 and we can set it to 1 now! I just tested this with a few labels and it improved their location!
Hmm. I'm wondering if we can leverage the 0-zoom labels to our advantage to find other labels that are wrong... Up until last week, we weren't fixing the zoom level as the user zoomed in/out. So if a label is on the same pano and applied by the same user, we might be able to assume that those other labels should be bumped up a zoom level too... Will need to do some more digging to see how far I can expand the number of labels we know to be incorrect based off this initial set :eyes:
Nice. This is huge. Thank you @misaugstad for your careful work on this! Yipee!
I'm wondering if we can leverage the 0-zoom labels to our advantage to find other labels that are wrong... Up until last week, we weren't fixing the zoom level as the user zoomed in/out. So if a label is on the same pano and applied by the same user, we might be able to assume that those other labels should be bumped up a zoom level too...
Looks like this is not the case! Maybe it worked like that on the Validate page, but on the Explore page it seems like clicking on the zoom button would fix the zoom level. The issue that was fixed was that the zoom level would go from 1.999 to 2 if you zoomed in, now it goes from 1.999 to 3. All that to say, I don't think that we can use a zoom level of 0 for one label to infer that other labels are zoomed incorrectly (thankfully!).
But I did find that if you don't change your zoom level, the zoom level stays the same when switching between panos. So when making admin tooling to deal with this, we could use that info. If we find a label that has their zoom set incorrectly, we could query for labels during that same session before the next time the since the last zoom level change and before the next zoom level change. Not ready to automatically update all of them yet, but I'll try to investigate some examples I find going forward.
Is there a way to tell how many labels are affected?
Is there a way to tell how many labels are affected?
Unfortunately not, for the same reason that we can't tell which ones are affected... What I can say is that there are 1,307 labels where the zoom level is set to 0 (something like 0.15% of labels), and we know that those ones are incorrect. But those are also the ones that we can fix easily! No idea how many others are messed up. :pensive:
Here are the label_ids in each city where the zoom is set to 0. Wanted to have this data downloaded before I change the zoom level in the dbs. zoom-0.csv
Recording examples I find where this is an issue:
Found an instance in Teaneck (label ID 11560) where zoom was set to 1, but it doesn't really look right unless I update zoom all the way to 3... Zoom 1
Zoom 2
Zoom 3
I haven't investigated any other labels by that user in the same session yet.
I suspect that this is just coming from user behavior (maybe they want to avoid covering up the thing that they are evaluating), but we haven't ruled out that it is a bug of ours yet. I think I want to collect a bunch of examples of this and then check to see if there are any patters in the browser, OS, or user(s) who are contributing data that look like this. That should inform how we deal with this issue. Here are a few examples from a quick look through Gallery. But I've seen much more egregious examples recently.
Label IDs: 9067, 29691, 49109, 3367 in SPGG.