Eugene-Mark / bigdata-file-viewer

A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.
GNU General Public License v2.0
282 stars 54 forks source link

Request custom schema-metadata #27

Closed mbode-asys closed 1 year ago

mbode-asys commented 2 years ago

Hi, we are utilizing the parquet functionality to enrich files with custom schema-metadata. I would be happy if you could output those additional metadata keys as additional tabs below "Basic Information" and "Schema Information"?

There are other useful information such as creator etc. which might be of interest also?

An example of the metadata we add:

{b'custom_metadata': b'{"main_cols": ["BKPF_SAPF15STATUS", "BKPF_AEDAT"], "translations": "[{\"columnname\":\"BKPF\\/SAPF15\\/STATUS\",\"lang_isocode\":\"DE\",\"lang_value\":\"Belegstatus\"},{\"columnname\":\"BKPF\\/SAPF15\\/STATUS\",\"lang_isocode\":\"EN\",\"lang_value\":\"Document Status\"},{\"column_name\":\"BKPF_ACC_PRINCIPLE\",\"lang_isocode\":\"DE\",\"lang_value\":\"Rechnungslegungsvorschrift\"},{\"column_name\":\"BKPF_ACC_PRINCIPLE\",\"lang_isocode\":\"EN\",\"lang_value\":\"Accounting Principle\"},{\"column_name\":\"BKPF_AEDAT\",\"lang_isocode\":\"DE\",\"lang_value\":\"Datum der letzten Belegänderung per Transaktion\"},{\"column_name\":\"BKPF_AEDAT\",\"lang_isocode\":\"EN\",\"lang_value\":\"Date of the Last Document Change by Transaction\"}]"}', b'pandas': b'{"index_columns": [{"kind": "range", "name": null, "start": 0, "stop": 3, "step": 1}], "column_indexes": [{"name": null, "field_name": null, "pandas_type": "unicode", "numpy_type": "object", "metadata": {"encoding": "UTF-8"}}], "columns": [{"name": "BKPF_SAPF15STATUS", "field_name": "BKPF_SAPF15STATUS", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}, {"name": "BKPF_ADIS", "field_name": "BKPF_ADIS", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}, {"name": "BKPF_AEDAT", "field_name": "BKPF_AEDAT", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}, {"name": "BKPF_ACC_PRINCIPLE", "field_name": "BKPF_ACC_PRINCIPLE", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}], "creator": {"library": "pyarrow", "version": "4.0.0"}, "pandas_version": "1.2.5"}'}

I hope that this would be possible!

Thanks Meikel

Eugene-Mark commented 2 years ago

@mbode-asys Hi Meike, thanks for your interest about this project. Can you provide me with the parquet file you mentioned (The file with customized metadata)? I need to evaluate whether it's feasible to implement this new feature.

Eugene-Mark commented 1 year ago

Closed since the requirement is not clear.