codeforsanjose / OSM-SouthBay

Making the best possible map of San José and the South Bay
https://www.openstreetmap.org/#map=12/37.3358/-121.8906
MIT License
12 stars 3 forks source link

Clean up Santa Clara County street import #51

Open 1ec5 opened 4 months ago

1ec5 commented 4 months ago

In 2020, Stanford Libraries republished a public domain dataset of streets throughout Santa Clara County that the Santa Clara County Planning Office used to publish on its open data portal.[^centerlines] Last August, @jeffreyameyer imported an extract of this dataset into OHM, 1,886 features in all, ahead of a presentation at Stanford. The import covers the Stanford campus, downtown Mountain View, and some major streets in that part of the San Francisco Peninsula. This issue tracks cleaning up the import to follow OHM norms.

A map of Stanford and vicinity with 1,886 street features highlighted.

The dataset has a date_creat field, but this only indicates when the feature was added to the database in ArcGIS, generally between 2004 and 2008. By contrast, the import tagged every street as if it started on March 1 in various years in the 19th and 20th centuries.[^leap] These seem to be estimates based on some old maps, but the placeholder month and day leave me a bit uncertain about that.

Aside from dates, most of the other attributes need to be cleaned up. For example, on this stretch of San Antonio Road:

[^leap]: 1924 was not a leap year, so every software package in our stack interprets start_date=1924-02-29 as March 1, 1924. [^centerlines]: This dataset has been superseded by a continuously updated Road Centerlines dataset, also in the public domain. [^metadata]: Unfortunately, the original dataset is no longer available online, and although it came with an FGDC metadata file, this file says nothing about each attribute.

1ec5 commented 4 months ago

These seem to be estimates based on some old maps, but the placeholder month and day leave me a bit uncertain about that.

By the way, #47 has an idea for dating streets with more certainty back to 1992. But if we want to stick with this outdated county dataset, we should replace the start_date=* start_date:source=arbitrary with a more pessimistic start_date=* and start_date:edtf=* based on the date_creat field. Then mappers can selectively work their way backward through time, with the ability to choose between this import or a different source. (For example, it should be possible to source state-maintained highways more rigorously without relying on this import.)

1ec5 commented 4 months ago

@jeffreyameyer do you remember how the start dates came about? Were the years for real but with placeholder months and days?

jeffreyameyer commented 4 months ago

Ok - clearly, I've left some incomplete work - my apologies! But, I do think things can be cleaned up quickly. Please see notes / comments below.

The years were largely set by choosing an arbitrary (sorry!) old year, then comparing slowly to old maps and adjusting backward as the maps got older. Roads that stopped showing up as you went back in time didn't get older years, those that did show up continued to get older years. This is not a foolproof method, but is directionally useful and having edtf tags is indeed a better solution than the "arbitrary" markings.

1ec5 commented 4 months ago

(Better yet, delete oneway=tf and reverse those ways.) and then tag with oneway=yes?

Yes, both the TF and FT values appear to indicate one-way streets. The dataset represents a two-way street by setting the field to null.