Open jonfroehlich opened 2 years ago
We are not currently including that low quality data. Adding a flag for that is a bit more complicated than it initially seems, because we need to decide whether or not to include the low quality data early in the clustering process. When we do nightly clustering, we could potentially run the second stage of clustering (multi-user clustering) twice (once with and once without low quality data) and store both of those in the database in order to support adding that flag.
Gotcha. Useful to know regardless.
We should include such information in the API docs themselves (but this goes back to generally improving those docs anyway https://github.com/ProjectSidewalk/SidewalkWebpage/issues/2754)
Chu and I were talking today. She is doing an analysis of our Seattle data. One question that came up is whether our APIs return data from "low quality" users (that is, users that our algorithm inferred as low quality or that were marked as manual). We've been recently shifting our visualizations and how we compute "city completion" by ignoring data from low quality users, so I was curious how this is being handled with our APIs. Perhaps we could have a flag in the API, something like
includeLowQualityData=true
Related to https://github.com/ProjectSidewalk/SidewalkWebpage/issues/2966 and https://github.com/ProjectSidewalk/SidewalkWebpage/issues/2952.