Open robredpath opened 5 years ago
Yes!
There are use cases for rolling up multi-item arrays (i.e. preserve array indices in column headings) but these have not been confirmed by actual users.
Sent from mobile
On May 30, 2019, at 3:52 PM, Amy Guy notifications@github.com wrote:
When flattening, multiple sheets/csv files might be generated if there are nested objects in the JSON. The user doesn't always want extra sheets, but in some cases may want the objects all in the main sheet with their full JSON path as column headings, like what rollup does if you include it in a schema.
"Roll up" columns from subsheets into the main sheet if they are specified in a rollUp attribute in the schema.
(rollup docs)
Use cases I know of so far are PPP (one contract) and Dhangadhi (one award, supplier and contract per contracting process) so they can all go in the same sheet.
Following discussions with @kindly and @duncandewhurst, implementation outline:
User input is a file with a list of paths or a single path on the commandline. Extends existing rollup argument to pass files or paths directly, so the schema doesn't need to be modified. Only works if there is one object in the array. ie. we will not have duplicate data across multiple rows in the output (consistent with current rollup behaviour). Column headings will not have array indices in them. @jpmckinney does that sound like it meets current needs?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
Implementation so far works as follows:
--rollup
can be passed with no argument, with one or more JSON path strings as arguments, or with a line-separated file of JSON strings attached.rollUp
attribute, flattening succeeds but no rolling up takes place. (TODO: add a warning)rollUp
, the rollup fields from the schema take precedence; the direct or file inputs are ignored.--use-titles
is also passed but there's no schema, there's a warning, and flattening succeeds with the JSON paths as column headings.The default behaviour of flattening is to include array indices in the column headings (meaning output from flattening can be unflattened again out of the box). Since rollup only works when there's one item, the column headings are always parent/0/child
, and the initial spec for rollup said column headings should not have array indices in them. But stripping the /0/
out would be inconsistent, and I think result in a different unflattened output. So I think now we should keep them in.
Sounds good! And agreed to keep the /0/
in. For the third bullet ("If there's an argument and a schema with rollUp, the rollup fields from the schema take precedence; the direct or file inputs are ignored."), I think a warning should be issued if inputs are ignored.
Is it okay for rollup
input to be restricted to properties on the root object? Eg. if the rolled up object has 5 of its own fields, they will all be included in the main sheet, you can't specify only specific fields of the rolled up object to include by passing them to rollup
. That is (in the testing cafe use case), you can pass dishes
to get all of the properties of the dish, but not dishes/title
to get only the name of the dish included.
I think we don't need to add the possibility to pass more specific JSON paths to rollup
because it can be combined with preserve-fields
or filter-*
to remove unwanted fields at a lower level. You'd pass --rollup=dishes
and preserve-fields={a file containing dishes/title}
to roll up only the name of the dish.
That makes sense. Let's consider a different scenario:
Let's say there's a dataset where every contract has at most one related process. So, we want to roll up the related process into the contracts
sheet. Does the current roll-up feature allow that?
This is an example where the property to roll-up isn't on the root (release) object.
The original rollup feature has never worked on non-root objects, it is only possible to rollup into the main sheet, so unfortunately the extended rollup feature also can't do this.
It will be quite challenging to implement, so is it okay if we log this in an issue for future work? (We can pick it up in the new process if it's a priority.)
Sounds good! Please create the new issue.
@rhiaro the associated Trello card is archived - can this be closed now?
When flattening, multiple sheets/csv files might be generated if there are nested objects in the JSON. The user doesn't always want extra sheets, but in some cases may want the objects all in the main sheet with their full JSON path as column headings, like what
rollup
does if you include it in a schema.(rollup docs)
Use cases I know of so far are PPP (one contract) and Dhangadhi (one award, supplier and contract per contracting process) so they can all go in the same sheet.
Following discussions with @kindly and @duncandewhurst, implementation outline:
rollup
argument to pass files or paths directly, so the schema doesn't need to be modified.rollup
behaviour).@jpmckinney does that sound like it meets current needs?