Closed ananyo2012 closed 7 years ago
As always, I propose to think on some real data examples. Just one, yet pretty interesting point is OpenWeatherMap. Here is API sample: http://samples.openweathermap.org/data/2.5/forecast?lat=35&lon=139&appid=b1b15e88fa797225412429c1c50c122a1 (here are docs: https://openweathermap.org/forecast5). Ideally, we need to be able doing something like...
URL = '...open weather map sample url...'
response = open(URL).read
df = Daru::DataFrame.from_json(response, some: inventive, parameters: of_conversion)
# => DataFrame with timestamp, temperature, wind speed and so on
Ideas?
As discussed with @athityakumar in mail, there is a time to think more clearly about what is necessary here. Some statements:
to_json_hash
and to_json_array
, without introduction of tons of options (though, we can discuss it, it is just a suggestion).Daru::IO.from_json("{some: json}",
index: '$.some.json.path',
col1: '$.some.other.path',
col2: '$.even.more.path'
)
It is, though, just a rough idea for consideration, not an instruction for implementing :)
Agreed - the example especially looks good. :+1:
Giving xpath-like option with jsonpath gem to users will definitely make the from_json
module user-friendly for nested JSONs (which most social-media graph APIs provide).
@zverok - Regarding the exporter, should using from_json
and to_json
recreate the json source? That is, should
df = Daru::DataFrame.from_json(source, xpath-opts)
df.to_json(inverse-xpath-opts or block)
recreate the json source x
?
Or maybe if a user wishes to recreate the json source,
df = Daru::DataFrame.from_json(source, xpath-opts)
df.to_json.map { |ele| restructure(ele) }
would be an easier way out?
I am not sure what's the scope of your question. You mean, char-by-char correspondence? I don't think it should be the first goal, though, it is an interesting side-task (which by the way validates equivalent power of importer and exporter).
Yes, I was asking whether the equivalent power of importer and exporter should (or can) be provided to JSON. Because, for creating a complex nested JSON from a DataFrame (to_json
) - the missing data have to be provided manually (unless we store them as a class variable, which wouldn't be good either), and there's no other way apart from the user manually mapping and manipulating the hash given by to_json
right?
Yes, seems so. If the user needs structure like
{
metadata: {something}
data: [
//the real DF output, like: {col: value, col2: value2}
]
}
...then the most simple option would be, probably, just construct something like
df.to_json(col1: '$.data.col', col2: '$.data.col2')
...and then merge it with some metadata
. But probably in this case also useful thing would be method like as_json
, which will return not string, but plain Ruby structures (hashes and arrays) which will be easier to merge with other data.
@zverok - I've submitted a Pull Request for JSON Importer, with support for parsing from specific x-paths as per this issue. Please review https://github.com/athityakumar/daru-io/pull/21 whenever you're free 😄
OK, let's close this ticket, than, and for any further considerations continue in daru-io
project.
For the record, I am not very happy about the fact that ton of PRs were merged without me :( Of course, I am guilty myself, because I've been absent almost for 10 days at important period, but in future feel free to at least ping me and ask about it -- I was "kinda" online, so at least I could say something like "OK, merge it" or "Sorry, wait for me, work on next task in the meantime", OK?
...and in fact you merged it AFTER that I've wrote I am back and reviewing everything. It is pretty weird. I have a lot of consideration about that one, and will review it now, and you'll need to plan fix my notes later, when you'll have time.
Sorry @zverok, I merged them only after it was approved by at least one of the mentors. I'll definitely take this in consideration in the subsequent PRs. I'm genuinely sorry. I'll definitely take your reviews for all these merged PRs. Please feel free to review. 😄
I think this is a good time to work on this. There is already a
to_h
method for dataframe and vectors but no suchFile.write
method for json. API calls mostly follow json formats so the importer should be able to read json data from api calls. We can start off by a simplewrite_to_json
method andfrom_json
method inDaru::IO
module. Since Ruby comes with json we don't need to add any extra dependency. We just need to require json .