Open Azbesciak opened 1 week ago
release 24-11-13.0, it was also in the previous
Discussing in Schema TF meeting today, seems like it's a central data pipeline bug. Pipeline is currently merging all top-level properties for any feature type in the theme at the theme level.
Best result would be bug fix in data pipeline: Parquet files should only define columns for the types they contain.
Shared this with @ibnt1 and #tf-data-platform
.
i checked and the parquet files come like this into the theme promote - that is with same schema for all types.
the current schema validation process at central pipeline only checks that the data types for the columns in the each theme-type pair in the parquet to be promoted matches the data type in the reference parquet schema, no matter if that column is defined for a given type or not, so it essentially keeps all columns from the upstream theme parquet if their types are correct.
we should revisit that and perhaps enforce the allowed columns per type, probably by managing a separate reference schema for each type, as i remember there was at least one more other time in the past where this could have caught a similar issue, but that would be a large-ish work item. created work item for this: https://github.com/OvertureMaps/tf-data-platform/issues/780
but independent of that it should be first fixed in the upstream divisions pipeline - @DavidKarlas - can you please look into that?
Hello, the mentioned field
division_ids
andis_disputed
is declared only for typedivision_boundary
https://github.com/OvertureMaps/schema/blob/5b33a0236e389793e2dc5a8cd55725a1feb2ec2f/schema/divisions/division_boundary.yaml#L33Also looks like there is a
division_id
field, but it is not declared in thedivisions.yml
schema However turns out it also is present in division and division_areas, even if not declared there. Please fix the schema. As you see in edits history there are other cases (as I see later alsoperspectives
is missing in areas definition, for instance) Runtime schema for DivisionFor Division area - the difference is
class
field, comparing to division.