Why not combining the data resources and the datapackage.json into one json file?

Actually datapackage standard allows for inli ne data, so that's definitely possible (and compliant with the spec). In many cases, having a single json file is not ideal as it would be difficult to stream and usually would require loading it entirely into memory - which wouldn't be good for very la rge datasets. However, for small and medium datasets it could work. Check out the jsondumper class - you can modify it to achieve what you want. If you want to make a PR (e.g. enable this using an inline=True parameter) it would be awesome.

On Fri, May 31, 2019, 21:59 Baiyue Cao notifications@github.com wrote:

I've been using dataflows to process data and dumping to s3 using a custom dumper I wrote based on the datapackage-pipelines-aws package. Everything works pretty well however when it comes to version control, I've encountered issues. because the data file(usually a csv) and the datapacakge files are dumped separately, it makes it difficult to compare existing versions (using md5 checksum). so I would create a new version of the datapackage.json but not the csv. With the current structure, it's difficult to say if we are creating a new datapackage.json, cache a new csv too. I was wondering if it would be beneficial to dump data resources with datapackage.json in one big json file?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/datahq/dataflows/issues/92?email_source=notifications&email_token=AACAY5IYYMN6NRN5PYHXEKDPYFYRTA5CNFSM4HR5HGSKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GXAY3MA, or mute the thread https://github.com/notifications/unsubscribe-auth/AACAY5NMNSAJILR5MU22B3TPYFYRTANCNFSM4HR5HGSA .

datahq / dataflows

Why not combining the data resources and the datapackage.json into one json file? #92