frictionlessdata / frictionless-py

Data management framework for Python that provides functionality to describe, extract, validate, and transform tabular data
https://framework.frictionlessdata.io
MIT License
701 stars 147 forks source link

Enhanced support for metadata in SAV Files #1553

Open ipimpat opened 1 year ago

ipimpat commented 1 year ago

In our research group, we store a significant amount of data in SPSS's SAV format. The format is advantageous due to its comprehensive metadata handling capabilities, which include variable labels, value labels, missing value definitions, and multiple response sets, among others. However, this format poses significant challenges when we attempt to utilize other programming languages or tools for data analysis and manipulation.

To increase interoperability and efficiency within our group, we're exploring open-source, platform/language-agnostic formats similar to the SAV format, specifically those capable of storing complex metadata.

We have been testing Frictionless, but we find the standard somewhat lacking in terms of support for the complex metadata available in the SPSS's SAV format (and even SAS's sas7bdat/sas7bcat format too).

Currently, we manually extract all this information and store it in CSV files, but this seems like a task that a framework like Frictionless should handle seamlessly.

We would greatly appreciate it if Frictionless could enhance support for reading all the metadata available in the SAV format. In particular, it would be beneficial if it could apply formatting options to data specified in the metadata.

This includes, but is not limited to:

This link provides a comprehensive guide to the different types of metadata possible to specify in the "Variable view" in SPSS.

We believe that such enhancements would not only benefit our research group but also other users who work with similar data formats. We look forward to seeing these improvements in future versions of Frictionless.

roll commented 1 year ago

Hi @ipimpat,

Thanks a lot for such a well-described feature request!