frictionlessdata / datapackage-pipelines

Framework for processing data packages in pipelines of modular components.
https://frictionlessdata.io/
MIT License
119 stars 32 forks source link

Default datetime format is incorrect #132

Closed cschloer closed 6 years ago

cschloer commented 6 years ago

Hi, using python 3.6.5, os is ubuntu 16.04

The documentation (https://frictionlessdata.io/specs/table-schema/#date) says that the default datetime format is YYYY-MM-DDThh:mm:ssZ. However, the default datetime specificed in extenden_json.py (https://github.com/frictionlessdata/datapackage-pipelines/blob/master/datapackage_pipelines/utilities/extended_json.py#L11) is '%Y-%m-%d %H:%M:%S', missing the T and the Z. Thus when I use the dump.to_path processor it gets dumped in the '%Y-%m-%d %H:%M:%S' format. Is this intentional? I'm happy to make a quick PR if it isn't.

Related, I'm having difficulty overwriting that default value. In my pipeline I have a custom processor that creates updates the datapackage to look like

...
{'format': '%Y-%m-%dT%H:%M:%SZ', 'name': 'TestDateConverted', 'type': 'datetime'}
...

When I run the dump.to_path processor the outputted datapackage.json gets changed to

...
{'format': '%Y-%m-%d %H:%M:%S', 'name': 'TestDateConverted', 'type': 'datetime'}
...

Is this a bug in the dump.to_path processor? It seems like it's updating the datapackage.json file on its own (using default values).

Thanks!

akariv commented 6 years ago

Hey @cschloer

So, you're right that dump.to_path normalises formats when dumping - but that's on purpose: we want that dumped CSVs will use the 'best-practice format', that's easiest to parse and understand later by Python. You're also right that for datetime I've chosen a bad format, and '%Y-%m-%dT%H:%M:%SZ is probably more correct. Would you like to send a PR to fix that? The file that needs to be changed is https://github.com/frictionlessdata/datapackage-pipelines/blob/master/datapackage_pipelines/utilities/extended_json.py

cschloer commented 6 years ago

Hey @akariv

Just made a branch locally but was unable to push it up to origin to make a PR. Do I need to be added to the contributors list? Sorry, I'm unfamiliar with the process :)

akariv commented 6 years ago

Usually what you would do is make the change on a fork of the repo (in your own GitHub user) and make the PR from there :smile:

On Tue, Jul 31, 2018 at 9:56 PM Conrad Schloer notifications@github.com wrote:

Hey @akariv https://github.com/akariv

Just made a branch locally but was unable to push it up to origin to make a PR. Do I need to be added to the contributors list? Sorry, I'm unfamiliar with the process :)

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/frictionlessdata/datapackage-pipelines/issues/132#issuecomment-409330424, or mute the thread https://github.com/notifications/unsubscribe-auth/AAQMdZt7-pnPLoaM9O2gZAghAfa37TOSks5uMKhggaJpZM4VO80z .

cschloer commented 6 years ago

@akariv Got it, thanks! :)

https://github.com/frictionlessdata/datapackage-pipelines/pull/138