Open haberdashPI opened 5 years ago
Hi David! We are definitely interested in a smooth support for JSON data. Currently, it is difficult to convert a Dict
to a DataKnot
because a DataKnot
needs to know the shape of the input data. For a NamedTuple
-based structure, it's easy to get the full list of attributes and their types.
julia> typeof((joe = (bob = [1,2,3], bill=[3,4,5]),))
NamedTuple{(:joe,),Tuple{NamedTuple{(:bob, :bill),Tuple{Array{Int64,1},Array{Int64,1}}}}}
But that's not the case for a Dict
object.
Doesn't seem like it would be hard to implement. I'm happy to submit a pull request if there's interest.
I'm not sure if a generic JSON support is a low-hanging fruit. Perhaps, some limited approach tailored to a specific data format may work well. That said, we'd love to have more contributors. Please let me know if you have any questions on the codebase. We usually linger on https://gitter.im/rbt-lang/rbt-proto.
Perhaps I am missing something, but a Dict can be readily converted to a NamedTuple.
For example:
as_namedtuple(xs) = xs
function as_namedtuple(xs::Dict{<:AbstractString})
kt = Tuple((Symbol(x) for x in keys(xs)))
vt = Tuple(values(xs))
NamedTuple{kt}(as_namedtuple.(vt))
end
There are obviously cases where this will fail, for particular values of vt
, but it shouldn't be hard to handle all of the input you would expect to get from JSON.parse
, for example.
It occurred to me that what you might be saying is that even if a JSON file was represented as a NamedTuple
structure, not all such structures could be handled by DataKnots: that seems okay. Fundamentally the reason would not be related to the Dict
type in that case, but rather to the particular structure of the data.
If a JSON file has a format which could be handled by the appropriate NamedTuple
representation, it seems nice to make that possible.
This transformation is indeed fragile, but, as you said, it's not really an issue.
However, we'd like to avoid any data transformations in the DataKnot
constructor and just let it wrap the input value, whatever it is. Instead, any transformations should be preferably offloaded to the internal query engine. This ensures that values of the same type are treated consistently, regardless whether they are top-level or nested, or obtained as an intermediate query result.
The problem with Dict
values is that the DataKnots query engine, in its current state, doesn't handle well the transformations where it cannot statically determine the shape of the output.
I can't say I understand why, fully, for this particular case, it's important to maintain the original format, since the semantics of JSON are the same irrespective of whether they are stored as a Dict or a NamedTuple.
That said, I don't need to get everything about your design choices to appreciate their consequences, which has, so far, resulted in a system I'm excited to try out. And I can easily do what I want, just by using the following.
convert(DataKnot,as_namedtuple(JSON.parse(file)))
So it is not a big deal either way.
David, thanks for this ticket. I'm glad you found a simple work-around. Let's keep this ticket open till we've provided more direct support for JSON values, offloaded to the internal query engine.
We need to support JSON no matter how it is provided or mixed into the query. For example, we could have combinators that fetch external resources via socket requests, or, we may want our queries to work with JSON valued columns stored in a PostgreSQL database. Further, there are other data sources that are not JSON but have the same set of challenges.
Ah hah! That makes sense now. Thank you for explaining it to me. This is really neat!
Hi thanks for this package it's super elegant. I think one of the great things about this approach is the ability to handle non tabular data. So my first thought was to try it with some topojson, however
I ran into this issue of converting a Dict
to a DataKnot
. I found a solution, hopefully it can be useful to you's.
using DataKnots, JSON
file = download("https://raw.githubusercontent.com/deldersveld/topojson/master/countries/china/china-provinces.json","china-province.json")
# Constructs a named tuple from the dict
function namedtuple(d::Dict{A,B}) where {A, B}
keys = dictkeys(d)
values = map(dictvalues(d)) do x
namedtuple(x)
end
NamedTuple{keys}(values)
end
namedtuple(a::Array) = map(namedtuple, a)
namedtuple(x::Number) = x
namedtuple(x::AbstractString) = x
namedtuple(x::Any) = error("please define namedtuple for this type")
I then tried
nt = read(file, String) |> JSON.parse |> namedtuple;
DataKnot(:china => nt)
# Errror
# type UnionAll has no field parameters
Interestingly however it works if you recursively construct the DataKnot
like the named tuple
function dataknot(nt::NamedTuple)
nt_keys = keys(nt)
nt_values = map(values(nt)) do v
dataknot(v)
end
DataKnot([k => v for (k,v) in zip(nt_keys, nt_values)]...)
end
dataknot(a::Array) = map(dataknot,a)
dataknot(x::Number) = x
dataknot(x::AbstractString) = x
dataknot(x::Any) = error("please define dataknot for this type")
Then dataknot(nt)
is all bless.
A side note that might be useful for tracking down the UnionAll
error
# this works
b = DataKnot(:a => nt.objects.CHN_adm1.geometries)
c = DataKnot(:a => nt.objects.CHN_adm1.type)
d = DataKnot(:b => b, :c => c)
# but this errors
DataKnot(:chn => nt.objects.CHN_adm1)
Oppps the above comment is me, I used the wrong github account.
Hi there! I'm excited by the potential of this library for my day to day work: I appreciate the elegant, coherent and highly compose-able nature of this approach.
I would expect the two following commands to result in the same DataKnot
Since—with
using JSON
—the latter format is how data is parsed.Doesn't seem like it would be hard to implement. I'm happy to submit a pull request if there's interest.