ProjectSidewalk / SidewalkWebpage

Project Sidewalk web page
http://projectsidewalk.org
MIT License
84 stars 24 forks source link

For users inferred as low quality or marked as low quality manual, do we include them in our API #2967

Open jonfroehlich opened 2 years ago

jonfroehlich commented 2 years ago

Chu and I were talking today. She is doing an analysis of our Seattle data. One question that came up is whether our APIs return data from "low quality" users (that is, users that our algorithm inferred as low quality or that were marked as manual). We've been recently shifting our visualizations and how we compute "city completion" by ignoring data from low quality users, so I was curious how this is being handled with our APIs. Perhaps we could have a flag in the API, something like includeLowQualityData=true

Related to https://github.com/ProjectSidewalk/SidewalkWebpage/issues/2966 and https://github.com/ProjectSidewalk/SidewalkWebpage/issues/2952.

misaugstad commented 2 years ago

We are not currently including that low quality data. Adding a flag for that is a bit more complicated than it initially seems, because we need to decide whether or not to include the low quality data early in the clustering process. When we do nightly clustering, we could potentially run the second stage of clustering (multi-user clustering) twice (once with and once without low quality data) and store both of those in the database in order to support adding that flag.

jonfroehlich commented 2 years ago

Gotcha. Useful to know regardless.

We should include such information in the API docs themselves (but this goes back to generally improving those docs anyway https://github.com/ProjectSidewalk/SidewalkWebpage/issues/2754)