aggregate datasets into useful structure before returning

katrinleinweber commented 6 years ago

noticed while working on #16

retrieve_data() currently appends multiple downloads into a continuous list in which the datasets can't be addressed anymore. We need a data structure, that lets the user $-address the datasets, and their fields. Ideally, each dataset is referred to by index = bacdive_id. Something like a sparse list-of-lists?!?

ideas:

[x] ~~aggregate JSON strings in character vector, then rjson::fromJSON() them "in-place" or somehow that creates the nested lists "below / as lower hierarchies" of that vector~~
[x] ~~write-out each dataset to a file (kind of a local cache), then maybe concatenate files & re-import as a useful data structure~~
[x] ~~use jsonlite to create 1 dataframe per bacdive_ID, then add those to a list~~
[x] ~~keep on c()ombining downloads, but~~ aggregate into a higher-level list and use an apply variant to extract a field/element from the resulting "megastructure"

katrinleinweber commented 6 years ago

jsonlite::fromJSON(…, flatten = TRUE) and simplifyDataFrame = TRUE both still return a list of nested lists with DFs as "leaves". Still need to work out how to extract a field/element (say culture_growth_condition$culture_temp$temp from a combination of these list-of-lists :-/

screen shot 2018-03-12 at 16 09 58

katrinleinweber commented 6 years ago

@sckott: Hello, and thanks for your advice! I got over this data structure problem :-)

katrinleinweber commented 6 years ago

For comparison with the above screen shot: between

a) data above / Bac_hal_data in this example, and c) the lists (taxonomy_name, morphology_physiology, …, environment_sampli…, etc.) within the datasets, is now b) a list-of-list for each dataset, named by its numeric BacDive ID (1095 & 1847)

screen shot 2018-04-18 at 16 44 09

TIBHannover / BacDiveR

aggregate datasets into useful structure before returning #31