finopsfoundation / focus_converters

Parent repository to hold all common documentation and code samples for all FOCUS Converter projects
MIT License
83 stars 44 forks source link

Specify name of parquet output file #338

Closed oll-davidschneider closed 7 months ago

oll-davidschneider commented 7 months ago

Problem: I would to control the name of the output file for more consistency when loading results into downstream systems. Currently, the tool outputs a pseudorandom guid ie. 7ceb804f84d64c81a46b388b51400572-0.parquet

Proposed solution: Control the output name by leveraging the keyword argument basename_template already contained in the pyarrow.parquet.write_to_dataset method called by the Data Exporter __writer_process__. Supply a value to this argument via a new optional string command-line argument named --basename_template for the focus_converter convert command.

Alternatives: The CLI option could instead be supplied through some kind of configuration or environment variable then parsed if available. This seems unintuitive given the current usage style of the tool.