codeforsanjose / OSM-SouthBay

Making the best possible map of San José and the South Bay
https://www.openstreetmap.org/#map=12/37.3358/-121.8906
MIT License
12 stars 3 forks source link

Explore the topic of coverage % #11

Open 3vivekb opened 7 years ago

3vivekb commented 7 years ago

One of the holy grails of OSM is to figure out how much of the places in the city you have mapped.

Check out the exploratory research Minh has done. Can you think of a different way to estimate how much of the South Bay places have been mapped? Something scalable?

Remember to consider Terms of Use. Scraping Google APIs or Yelp for these purposes is not okay.

/cc @1ec5

1ec5 commented 7 years ago

To clarify, the approach I took was to count entries in the phone book and query OSM for features; these are hard numbers rather than estimates. I've largely wrapped up my counts for the 408 area code, but I welcome feedback about the approach and conclusions.

I'd also welcome a more scalable approach to facilitate an expansion into other area codes such as 650. My stunt of manually paging through the phone book could be streamlined using OCR. (The corresponding OSM queries are effortless to repeat, so no problem there.)

Beyond that, there's a question of whether the phone book is comprehensive enough to treat as a gold standard, especially given that some yellow pages directories demand payment for inclusion. (I don't know if this is the case with Valley Yellow Pages.) I think we can test its comprehensiveness by comparing the coverage of certain categories or chains to official sources. For example, compare the list of dentists to a government database of licensed dentists.

An alternative to taking a census would be to sample an area's coverage, but we'd need to control for a different set of factors, such as development patterns that affect POI density or the focus areas of individual mappers.

d-wasserman commented 6 years ago

Not sure if this is related, but in terms of coverage one of the things I have done for SCC is develop a tag availability index for key transportation related variables that are used to derive an LTS score. In theory something similar could be done for a broad spectrum of tags that are determined to be critical. This would only apply to a network however.

Alongside Yellow Pages, there are sales leads sites that very often overlooked as source of geocoded data. While I don't know if it is against their Terms of Service to use it directly for contributions, it might be scalable method to compare POI densities such as what @1ec5 is suggesting. The difference in densities between the two sources would be a potential indicator of an area in need of attention at the very least.

I do know mapbox has a project that generally tries to examine OSM quality more broadly. This might be a helpful resource.