datahq / dataflows

DataFlows is a simple, intuitive lightweight framework for building data processing flows in python.
https://dataflows.org
MIT License
194 stars 39 forks source link

Allow custom temporal formats for dump_to_path #100

Closed roll closed 4 years ago

roll commented 5 years ago
roll commented 5 years ago

@akariv @cschloer It's a very first POC for what we have discussed on the call. Both an implementation and API are experimental. Feedback are very welcome because I'm not sure how to make it more eloquent based on the fact the resource.schema.format is used for validation on the dump_to_path stage.

coveralls commented 5 years ago

Pull Request Test Coverage Report for Build 382


Changes Missing Coverage Covered Lines Changed/Added Lines %
dataflows/processors/dumpers/file_formats.py 19 20 95.0%
<!-- Total: 29 30 96.67% -->
Totals Coverage Status
Change from base Build 359: 0.2%
Covered Lines: 1629
Relevant Lines: 1930

💛 - Coveralls
coveralls commented 5 years ago

Pull Request Test Coverage Report for Build 339


Changes Missing Coverage Covered Lines Changed/Added Lines %
dataflows/processors/dumpers/file_formats.py 13 14 92.86%
<!-- Total: 21 22 95.45% -->
Totals Coverage Status
Change from base Build 332: 0.1%
Covered Lines: 1579
Relevant Lines: 1872

💛 - Coveralls
akariv commented 5 years ago

@roll, I think that a possibly better approach here would be to allow the user to provide their own dialect & serializer, allowing them to dictate exactly how data is saved to the output file.

wdyt?

roll commented 4 years ago

@akariv @cschloer Here is my second attempt. Please take a look at the PR. Full discussion - https://github.com/BCODMO/frictionless-usecases/issues/19


My first try felt more like a hack but the current version I think is better:


The option for dump_to_path is:

  • temporal-format-property - Specifies a property to be used for temporal values serialization. For example, if some field has a property outputFormat: %d/%m/%y setting temporal-format-property to outputFormat will lead to using this format for this field serialization.

And, for example, withing a DPP pipelines (tested):

temporal:
  title: temporal
  description: "temporal format"
  pipeline:

  - run: load
    parameters:
      from: 'temporal.csv'
      override_fields:
        date:
          outputFormat: '%m/%d/%Y'

  - run: dump_to_path
    parameters:
      out-path: 'output'
      pretty-descriptor: true
      temporal_format_property: outputFormat
cschloer commented 4 years ago

This solution seems like it would work pretty well for us!

roll commented 4 years ago

@akariv Could you please re-review?