SETI / rms-vicar

vicar Python module
Apache License 2.0
0 stars 1 forks source link

VicarError on inserting new parameter with occurrence #11

Closed AndrewAnnex closed 8 months ago

AndrewAnnex commented 9 months ago

For some use cases, I have a large number of parameters with repeated occurrences I want to convert to a VICAR label that is essentially empty as the data will be coming to me as some large nested json object. However, I get a VicarError in VicarLabel.__setitem__ when I attempt to use tuples as the key like label[('TEST',1)] = 2.

I am able to update the 2nd occurrence one I've created the first and then create the 2nd

lbl['TEST']  = 1
lbl['TEST+'] = 2
print('before', lbl[('TEST',1)])
lbl[('TEST',1)] =5
print('after', lbl[('TEST',1)])

but I get the error if I do

lbl[('TEST',0)]  = 1
# or
lbl[('TEST+',1)] = 2
# or
lbl['TEST',0] = 1,
# but the prior line does work if I called lbl['TEST'] = 1 earlier,

maybe this is just expected behavior, but it complicates situations where order of parameters is not known ahead of time or cannot be guaranteed for whatever reason. I am not sure exactly what the correct behavior should be though. If a user wants to update the 3rd occurrence of a parameter that has not yet had the first occurrence set, it could be reasonable to check for prior occurrences, setting those that are missing to a null value so that users don't have to track the order of things. If the keys all exist already then it shouldn't modify those prior values, and then just do the normal thing for the user provided occurrence (either set or append).

markshowalter commented 8 months ago

I don't quite see a use case is here. There is no "missing or null value" to assign to parameters that don't exist, so setting/appending a third value when there isn't already a second makes no sense. The existing use case for repeated parameter names is things like recording the processing history of an image, where TASK = name is appended each time a different process is applied.

However, I can agree to a change that allows __setitem__ to index one past the end of the current list (if any), so these should work:

lbl['NEW1', 0] = 1
# or
lbl['NEW2+', 0] = 1
# or
lbl['NEW3', -1] = 1
# or
lbl['NEW4+', -1] = 1
# or
lbl['NEW5'] = 1
lbl['NEW5', 1 ] = 2

This change is fairly easy so I will take care of it. If you can show me a realistic situation where a user wants to re-use VICAR parameter names and really needs to define them out of order, let me know. That would be a much bigger change.

AndrewAnnex commented 8 months ago

The use case here is for complicated labels like for Mars2020 or MSL camera data vicar labels where many coordinate systems are defined as properties with repeated keys.

With python ordered dicts being the default now, it isn't too hard to work around this, but it just kind of breaks the niceties of treating the label as a Mappable object more directly like PVL does (or MultiDict) so make it a little harder to use dict.update or other methods when as a user I have to keep track of the order and the number of re-occurences of nested parameters in properties.

Maybe what's needed instead is the ability to add support of getting/setting properties with dicts like

lbl['CS1'] = dict(name='foo')
lbl['CS2'] = dict(name='bar')
#print bar using current index based getter
print(lbl['name', -1])
# print bar using something that would be nicer
print(lbl['CS2']['name']) or print(lbl['CS2', 'name'])
markshowalter commented 8 months ago

I think I see the issue now. What we're dealing with is the fact that MSL/Mars2020 folks really want the label to support nested information, whereas the VICAR standard uses a flat namespace.

First of all, when it comes to adding parameters to a label, the __setitem__() syntax is convenient but it isn't really designed for adding large groups of parameters. There was always an append() method, which I modified ever so slightly so it now also supports dictionaries. I would think this is the sensible way to add information about a coordinate system all at once, ensuring that the parameters are in the correct order. For example, if you have a dictionaries cs1_dict and cs2_dict, each containing coordinate system parameters related to systems "CS1" and "CS2", you could write

lbl['SYSTEM+'] = 'CS1'
lbl.append(cs1_dict)
lbl['SYSTEM+'] = 'CS2'
lbl.append(cs2_dict)

This will yield a label containing "SYSTEM='CS1'", followed by all the associated parameters of the first coordinate system, followed by "SYSTEM='CS2" and its parameters.

There's also the issue about how to read parameters from the label. Suppose you have N coordinate systems in the label, each following an entry SYSTEM='name', followed by the associated parameters. If only a subset of these coordinate systems include a value for, say, MIN_LAT, then using __getitem__(), you would have a tough time figuring out which occurrence of MIN_LAT is the one associated with a particular system.

To handle this better, both __getitem__() and __setitem__() have a new indexing syntax. lbl['FOO', 'TASK'] refers to the first parameter named "FOO" that falls after the first occurrence of the parameter "TASK". In addition, lbl['FOO', 'TASK', 'GEOMA'] refers to the first parameter named "FOO" that falls after the first place in the label where parameter "TASK" equals "GEOMA". This type of indexing should make it relatively easy to focus in on parameter values found in a specific section of the label.