astropenguin / pandas-dataclasses

:zap: pandas data creation by data classes
https://pypi.org/project/pandas-dataclasses/
MIT License
49 stars 3 forks source link

ENH: pyarrow and optionally pydantic #166

Open westurner opened 1 year ago

westurner commented 1 year ago

What should be the API for working with pandas, pyarrow, and dataclasses and/or pydantic?

westurner commented 1 year ago

FWIW, re: data validation these days:pydantic_schemaorg validates with schema.org schema, and there's QuantitativeValue[Distribution], CSVW (CSV on the Web) is a standard for CSV in RDF, RDF has many representations: RDF/XML, Turtle (.ttl), JSON-LD (.json, application/ld+json), RDFa (RDF-in-(HTML)-Attributes), some applications - including search engines - work with at least bibliographic linked data like for subtypes of https://schema.org/CreativeWork such as https://schema.org/ScholarlyArticle and :Dataset and :DataCatalog. Other existing standards for data schema and/or validation: SDMX (pandaSDMX,), W3C Data Cubes (pandas-datacube,), JSONschema (pydantic, react-jsonschema-form,) and W3C SHACL (Schema.org,)

What does that mean for pandas and dataclasses and pyarrow and optionally pydantic?