Provide methods for normalizing the names of fields provided by ChangesetSource

jpolchlo commented 5 years ago

Pursuant to the conversation here, there are two different ways to name the fields provided by a changeset source. The streaming source (accessed by spark.read.format(Source.Changesets)) uses snake case for its variable names (e.g., created_at), while changeset ORC files tend to use camel case (createdAt). If one intends to use .as[Changeset] to convert to a Dataset, it will be necessary to use the latter convention.

We should provide the means to convert from one case structure to the other.

jpolchlo commented 5 years ago

Make sure to address https://github.com/azavea/osmesa/blob/baca909e376116350fbb0cf60e32889a9194f0b3/src/analytics/src/main/scala/osmesa/analytics/oneoffs/MergeChangesets.scala#L99 after providing this change.

mojodna commented 5 years ago

Clarifying: streaming sources use camel case, ORC files typically use snake case (per https://github.com/mojodna/osm2orc).

geotrellis / vectorpipe

Provide methods for normalizing the names of fields provided by ChangesetSource #113