Billingegroup / ml4ms

Python package for facilitating machine learning tasks on collections of materials data, especially including measured spectra.
BSD 3-Clause "New" or "Revised" License
0 stars 3 forks source link

fsclient now can read valid json. Also added tests #13

Closed sbillinge closed 8 months ago

sbillinge commented 8 months ago

Tests are passing so I think this can be merged. It means that we should be able to read the json that came from matplotlib into the standard style that we discussed.

tinatn29 commented 8 months ago

just to make sure I get this right

sbillinge commented 8 months ago

just to make sure I get this right

* load_json_collection loads a collection downloaded from MP [{'id_': ID1, 'first': val1}, {'id_': ID2, 'first': val2}] into something fsclient/pymongo can deal with (has 'id_' as keys) -> {ID1: {'id_': ID1, 'first': val1}, ID2: {'id_': ID2, 'first': val2}}

* and dump_json_collection does this in reverse? (the json output file has one entry {'id_': ID1, 'first': val1} per line)

not quite. MP yields pure json in its payload, which is not in the form of a collection (a list of docs). The function load_json() in io.py should be able to read the MP payload (I didn't check).

We will then dump this to a collection in a database. This "database" will be, by default, a bunch of either yml or "json-collection" files in the directory ../db with respect to the ml4msrc.json file where we run the script. It sounds a bit complicated but you get used to it quickly.

Maybe I will try and run through this process on my local using the json files you sent earlier.