Open justinlaughlin opened 8 months ago
I am composing DataStore
into my own class to add in nested data functionality atm, but if there is a more natural way to do this that'd be preferred.
Pinging @HaluskaR since this is related to your L2 code.
Hey there @justinlaughlin! I can/should add some validation for the data adders, I'll get a ticket up on my end.
On the topic of nested data: the limitations are due to the nature of the field. data
holds values that are queryable, so they need to follow a structure the datastore knows how to query. Hierarchical is fine, arbitrarily hierarchical is where we run into issues (what does each layer of the hierarchy represent, how does the user "intend" to query across children and children's children, etc), so what you'll want to do depends on your needs:
1) we can make it non-arbitrary by formalizing the structure so the datastore can be smart about it. That's things like curve_sets
and library_data
, where there was a recurring form of data between codes that people would want to query in a "hierarchical" way. Of course that one usually takes the longest since it's an addition to the underlying schema, and we'd want to make sure it's workable for a lot of codes/uses (currently undergoing this process for materials), but that's only if you want it on Sina's end--if you're expanding on Sina for a specific use, I'm happy to run you through how to set something like that up.
2) make it "non-hierarchical" (ex: coerce my { cool { data: 12
to be my/cool/data: 12
). This loses out on some implied flexibility (and making it more annoying to type) but makes it immediately available. In my experience, this one is handy to invoke when the hierarchy is more a side effect of the structure/code output rather than representative of how you'd want to query things, where you're always going to the leaf of the tree.
3) make it non-queryable. Records have a user_defined
section for storing any legal JSON (or anything you're willing to stringblob into JSON) that we won't try to index/validate/etc, simply storing and returning it exactly as given. This is probably the one you heard of, it's the most commonly useful, since a lot of nested data we've encountered are things users only want in context of the run itself.
Thanks for the catch on adding data!
Hi, thanks for the super detailed reply! The way I've structured the data your second suggestion makes the most sense - as it is only the "leaf nodes" that are being accessed. The hierarchies serve no purpose other than to organize data. It seems like user_defined
or library_data
would have also been able to perform the same role but in this case the data is not actually user defined/library data so I decided to use a "delimiter convention" and just store it in data
. E.g.
registration.user = ...
registration.date = ...
registration.time = ...
I have a few helper functions that help to lift/flatten between the nested dictionary and the "delimited key" dictionary. Of course this option requires a common agreement of what is being used as a delimiter, but since this is at the implementation level and not really exposed to the user its not a huge worry. I think Alex had mentioned that a /
-delimited convention may already be in use for data
but I couldn't get it to work. It would be nice to be "officially" conforming but this seems like a good enough solution.
If this is a common scenario then maybe something like these lift
and flatten
functions could be added to sina.util
? Or if data is entered as nested dictionaries then maybe sina could assume to keep going until it reaches the "leaf"? I don't think you could mix different types of hierarchies with this solution (e.g. list[dict, dict]
) but you could at least have a nested dictionary (e.g. dict[dict[list], int]
) where the lowest level are non-dict
values.
After browsing the source code a bit I think this flattening already exists and is applied automatically to library data.
sina/model/flatten_library_content
Hello,
It seems like it is possible to create a
Record
with a dictionary as its data value, but not possible to insert thatRecord
into aDataStore
. Here is a minimal examplereturns
ValueError: ['At least one data entry belonging to Record a has a dictionary for a value.Value: field']
I figured this would not be allowed as each data can also have a
units
field, but I have heard that there is some sort of option to use nested data. If there is a way to do so I would love to know more. Thanks.