ckan / ckanext-xloader

Express Loader - quickly load data into DataStore. A replacement for DataPusher.
GNU Affero General Public License v3.0
45 stars 50 forks source link

Json to datastore #230

Open aminumoha opened 1 week ago

aminumoha commented 1 week ago

I tried to add json as one of xloader format thinking it would be easier to parse the JSON to CSV and then follow the same procedure to send data to the datastore. Unfortunately the xloader fails to recognize the format and simply tries to load with the default CSV format -which leads to error.

duttonw commented 1 week ago

JSON data is a hard one due to the ability to have objects in objects and no idea on how the schema to flat table should be read.

Are you able to attach the JSON file example and the expected table you think it should make.

Also what do you think the requirements should be in loading into a data store.

Ie is it a list of key value items or a list of stings

aminumoha commented 1 week ago

Ie is it a list of key value items or a list of stings

my case is a list of of objects with key-value items which is pretty much like [{"id": 1, "name": abc,.....},.....]. So, I would like to store these objects in a datastore table just like I would to CSV file. I can see the challenges in the case of nested objects, but for flat CSV-like JSON record format, the utility of having that data in a datastore can be immense,

duttonw commented 6 days ago

So would these test cases be what your after for importing into the datastore?

[
  {"Name": "Alice", "Age": 30, "Occupation": "Engineer"},
  {"Name": "Bob", "Age": 25, "Occupation": "Designer"},
  {"Name": "Charlie", "Age": 35, "Occupation": "Manager", "Extra field": "wont be included"}
]

Which would make a csv/table like

Name, Age, Occupation
Alice, 30, Engineer
Bob, 25, Designer
Charlie, 35,Manager

i guess we should also handle

[
  ["Header1", "Header2", "Header3", "Number Header"],
  ["Cell", "Cell", "Cell", 10],
  ["Cell", "Cell", "Cell", 15],
  ["Cell", "Cell", "Cell", 20],
  ["Cell", "Cell", "Cell", 25]
]

expected output

Header1, Header2, Header3, Number Header
Cell, Cell, Cell, 10
Cell, Cell, Cell, 15
Cell, Cell, Cell, 20
Cell, Cell, Cell, 25

I'm unsure how we could handle if its wrapped in a key that holds the array as it gets tricky in programming the conversion.

wardi commented 6 days ago

yet another format is the one from CKAN's datastore json dump endpoint:

{
  "fields": [
    {"id": "Name", "type": "text"},
    {"id": "Age", "type": "numeric"},
    {"id": "Occupation", "type": "text"},
  ],
  "records": [
    ["Alice", 30, "Engineer"],
    ["Bob", 35, "Designer"]
  ]
}

Additional information entered into the data dictionary also appears in the "fields" dicts.