ENH: pyarrow and optionally pydantic

FWIW, re: data validation these days:pydantic_schemaorg validates with schema.org schema, and there's QuantitativeValue[Distribution], CSVW (CSV on the Web) is a standard for CSV in RDF, RDF has many representations: RDF/XML, Turtle (.ttl), JSON-LD (.json, application/ld+json), RDFa (RDF-in-(HTML)-Attributes), some applications - including search engines - work with at least bibliographic linked data like for subtypes of https://schema.org/CreativeWork such as https://schema.org/ScholarlyArticle and :Dataset and :DataCatalog. Other existing standards for data schema and/or validation: SDMX (pandaSDMX,), W3C Data Cubes (pandas-datacube,), JSONschema (pydantic, react-jsonschema-form,) and W3C SHACL (Schema.org,)

https://github.com/lexiq-legal/pydantic_schemaorg generates templated pydantic .py source files containing validators for all of the rdfs:Class and rdfs:Property defined in a release of the https://schema.org/ meta-vocabulary
For example:
W3C SHACL
- pydantic does not (yet?) do W3C SHACL validation
- hhttps://github.com/siqueirarenan/shacl-jsonschema-converter
- https://github.com/mulesoft-labs/json-ld-schema
- https://github.com/RDFLib/pySHACL
- https://github.com/RDFLib/pySHACL/blob/master/test/test_schema_org.py
- https://github.com/RDFLib/pySHACL/tree/master/test/test_js
[ ] https://github.com/pandas-dev/pandas/issues/3402
- https://arrow.apache.org/docs/python/data.html#custom-schema-and-field-metadata
- https://github.com/pandas-dev/pandas/issues/2485
- https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.attrs.html DataFrame.attrs is a dict that anything can or could modify upon read, transformation, or write; and may not be persisted by file formats that do not support an auxiliary metadata file
- https://pandas.pydata.org/pandas-docs/stable/development/extending.html#define-original-properties
```
class DataFrameWithNonAttrsMetadata(pd.DataFrame):
 _metadata = ["additional_attrs", "prov"]
```
W3C PROV is a Linked Data specification for specifying data provenance information: who, what, when, how, etc.

What does that mean for pandas and dataclasses and pyarrow and optionally pydantic?

[ ] How should additional per-field metadata be specified with type annotations (if type annotations are syntactically sufficient and preferable)?
[ ] Linked Data is about URIs. How should URIs be specified when specifying data validation schema which are essential to data quality?

astropenguin / pandas-dataclasses

ENH: pyarrow and optionally pydantic #166