Open nadove-ucsc opened 4 months ago
This would be complicated by the fact that we're indexing a diverse set of schema versions in a single catalog. We would have to compile all versions of all schemas during indexing, aggregate the schemas and generate the PFB schema from that aggregate. The schema aggregation would have to consider every property and every type of every property. Renamed properties would show up under their old and their new names, and removed properties would have to be retained. Invariantly, this semi-static process would need to produce the same PFB schema as the current dynamic schema generation for a manifest without filters.
As with AnVIL, we could create the Avro schema for HCA verbatim PFB manifests using the published entity schemas, e.g. https://schema.humancellatlas.org/type/project/14.0.0/project