Closed laborg closed 3 years ago
This is exactly the reason why in DataFrames.jl we have introduced cols
kwarg in push!
. But as @laborg commented on Slack - using it is a bit cumbersome (you have read JSON row by row and do this push!
). Using append!
or vcat
from DataFrames.jl does not help.
It would be good to have a better solution here. I think having cols=:union
approach as the default is what typically users expect (other values allowed in :cols
in DataFrames.jl are rarely needed).
Ok, I've been thinking about this on and off for a while now, along with the best way to approach a solution (in Tables.jl, maybe TableOperations.jl, or in this package). Take a look at what I came up with here: https://github.com/JuliaData/JSONTables.jl/pull/18. In short, we implement the cols=:union
behavior from DataFrames by doing a pass over the json data initially to accurately determine all the column names/types we are to expect when treating the json as a "table".
I think it would be good to have a better story for heterogeneous data. Both of the following results (which are generated from the same data but where entries are ordered differently) are surprising and can cause problems.
What I would have expected
jsontable
to produce:If this is not possible or desired at least the documentation should include a clear warning about what to expect.
Thx!