VizierDB / vizier-scala

The Vizier kernel-free notebook programming environment
Other
34 stars 11 forks source link

Export formats with single files should output a single file on the local FS #169

Closed okennedy closed 1 year ago

okennedy commented 2 years ago

What pain point is this feature intended to address? Please describe. Spark's default for exporting individual files targets HDFS, creating an entire output folder, a single output file per partition, and a "SUCCESS" file to indicate that output is complete. This is annoying when all you want is a single CSV or JSON file

Describe the solution you'd like

Although we already collapse data down to a single partition, it would be helptul if, for single-file formats like CSV, JSON, XML, etc... and when the user is outputting to the local filesystem, or as a file artifact, the system were to output only a single file. For example, we could output to a temporary directory, and then copy the lone CSV, JSON, etc... file out to the target path.