Open jpolchlo opened 5 years ago
Make sure to address https://github.com/azavea/osmesa/blob/baca909e376116350fbb0cf60e32889a9194f0b3/src/analytics/src/main/scala/osmesa/analytics/oneoffs/MergeChangesets.scala#L99 after providing this change.
Clarifying: streaming sources use camel case, ORC files typically use snake case (per https://github.com/mojodna/osm2orc).
Pursuant to the conversation here, there are two different ways to name the fields provided by a changeset source. The streaming source (accessed by
spark.read.format(Source.Changesets)
) uses snake case for its variable names (e.g.,created_at
), while changeset ORC files tend to use camel case (createdAt
). If one intends to use.as[Changeset]
to convert to aDataset
, it will be necessary to use the latter convention.We should provide the means to convert from one case structure to the other.