Figure out why we need to __getitem__

goodboy commented 6 years ago

Issue by tgoodlet Thursday Sep 22, 2016 at 00:50 GMT Originally opened as https://github.com/sangoma/sandswitches/issues/1

@vodik made a note and I can't remember why I needed it to get things working but the __getitem__ call here seems superfluous.

goodboy commented 6 years ago

Comment by vodik Thursday Sep 22, 2016 at 16:08 GMT

Looking at the code quickly, I bet you its because you're letting __setitem__/__getitem__ perform side effects in such a way that I think you're ending up transforming type and references in the data you've pushed through those functions.

Calling __setitem__ on the ElemMap class can trigger a call to etree.SubElement (this will make a copy of your data - something arguably __setitem__ really shouldn't be doing). Further changes to the data won't affect the new sub element. However I get the sense that this can happen with the data retrieved with __getitem__.

I see this comment in your code under __getitem__ there:

this is subtle, the valtype will have it's elem reassigned depending on which subtree is accessed

Not sure what valtype.fromelem will actually do. But does this mean you can sometimes modify internals by reference while other times you can't?

goodboy commented 6 years ago

Comment by vodik Thursday Sep 22, 2016 at 16:12 GMT

In all honestly, you should probably not use SubElement or any etree stuff until the last minute, just before the XML needs to be rendered out.

As far as I understand, its also how virt-manager does it: https://github.com/virt-manager/virt-manager/blob/master/virtinst/xmlbuilder.py

goodboy commented 6 years ago

Comment by tgoodlet Thursday Sep 22, 2016 at 16:22 GMT

Thanks for the feedback @vodik! I have to get this all back in my head (which I'm sure writing unit tests will help with). If I can't see good reasons for the design then, yes, we can gladly change it to something a bit more sane.

goodboy commented 6 years ago

Comment by tgoodlet Friday Sep 23, 2016 at 20:59 GMT

Alright so I found the obvious answer to this question. It's so that

obj.appendfrom('newkey', 'oldkey') is obj['newkey'] == True

The reason that this it not true with newval is that it is a deepcopy of some original configuration node (i.e. mapping) or value and some implementations of __setitem__() do not assign the input directly into the config; instead a copy of the data contents is.

Couple things to note here:

KeyValues.appendfrom is a convenience method to make it simple to copy from an existing child element and append into the same config location but under a different key. This of course comes in handy when you want to make a new profile of some sort but realy only need to change it's socket addresses.
self[key] = newval assigns the new value into the mapping element and the implementation/semantics is different depending on type(self).
When newval is being assigned to a mapping type such as ElemMap or TagMap, it is first converted into a regular dict and re-wrapped in an instance of the appropriate child mapping type. There are a couple reasons for this:
- We normally expect dict inputs; the minority case is the use of appendfrom().
- dict(value) ensures the input is type compatible with what we can process (i.e. later call to self.update())
- We only care about the data contained in the input data structure

@vodik to counter your points:

I bet you its because you're letting setitem/getitem perform side effects in such a way that I think you're ending up transforming type and references in the data you've pushed through those functions.

No side effects are triggered other then of course assigning the new element in the config (whether this is a simple scalar value or a sub-mapping type depends on the context).

Calling setitem on the ElemMap class can trigger a call to etree.SubElement (this will make a copy of your data - something arguably setitem really shouldn't be doing).

No, etree.SubElement is only called when there exists no element currently for key. This makes sense because we need to extend the XML config to include our new data.

Further changes to the data won't affect the new sub element. However I get the sense that this can happen with the data retrieved with getitem.

Right and this is the main question: can we avoid the dict(value) call which makes a copy of the input. I hadn't really thought about allowing assignment of mutable data structures which can then be changed elsewhere. In other words I currently do not support:

settings = {'sip-ip': '10.10.10.9'}
profiles['doggy'] = {'settings': settings }
settings['sip-ip'] = '10.10.10.11'
assert profiles['doggy']['settings']['sip-ip'] == '10.10.10.11'

but you're right ^ is more consistent with traditional mutable mappings.

I see this comment in your code under getitem there:

this is subtle, the valtype will have it's elem reassigned depending on which subtree is accessed

Old comment. valtype is assigned once inside buildfromschema depending on the schema description.

Not sure what valtype.fromelem will actually do. But does this mean you can sometimes modify internals by reference while other times you can't?

It's a factory method for creating new instances of the same type but with the some of the current instance's state (explicit instance vars in this case) passed to the new obj. You can always modify internals by reference afaict.

In all honestly, you should probably not use SubElement or any etree stuff until the last minute, just before the XML needs to be rendered out.

I don't think that's a good idea. That code in virt-manager is needlessly complex and convoluted imho. Also I want to be able to modify the etree in place such that I can component wise render to XML easily without having to render the entire tree or section copies.

goodboy commented 6 years ago

Comment by vodik Friday Sep 23, 2016 at 21:27 GMT

Calling setitem on the ElemMap class can trigger a call to etree.SubElement (this will make a copy of your data - something arguably setitem really shouldn't be doing).

No, etree.SubElement is only called when there exists no element currently for key. This makes sense because we need to extend the XML config to include our new data.

Well key word was can.

I don't think that's a good idea. That code in virt-manager is needlessly complex and convoluted imho. Also I want to be able to modify the etree in place such that I can component wise render to XML easily without having to render the entire tree or section copies.

I agree its overly complicated, but I wouldn't argue its due to the approach, its consequences to how they chose to expose/architect details.

I really do thing you want to do a full serialise/deserialise rather than this live editing. If you don't want to do that, just drop __getitem__/__setitem__. You're code violates reasonable expectations of how those methods should work.

goodboy commented 6 years ago

Comment by tgoodlet Friday Sep 23, 2016 at 21:59 GMT

I really do thing you want to do a full serialise/deserialise rather than this live editing. If you don't want to do that, just drop getitem/setitem. You're code violates reasonable expectations of how those methods should work.

Let me clarify, the code currently does serialise/deserialise before it hits the target file system (so it's not live edited from the perspective of the target FreeSWITCH's config) but the in-memory etree representation is modified in real time.

I don't see why this is a problem? Maybe you have a reason why this shouldn't be allowed?

Secondly, I'm pretty sure I can make __setitem__() behave the way you'd expect it just needs some fiddling.

goodboy commented 6 years ago

Comment by vodik Friday Sep 23, 2016 at 22:05 GMT

When I mean serialize/deserialize, I mean don't use etree as your internal representation. That should only be at the boundary if you can't make it work. But whatever, if you can make it work as expected.

goodboy commented 6 years ago

Comment by tgoodlet Friday Sep 23, 2016 at 22:37 GMT

@vodik if I don't modify the etree then the implementation becomes more complex as I have to store data in some other temporary representation. I also then have to map the temporary data to where it belongs in the etree as well as to what object relational mappers are necessary to apply on top of each etree elements/section in order to eventually render to XML.

Imo this complexity is unjustified when lxml.etree already does all this mapping/tracking work for you. I'd essentially being duplicating that functionality but one abstraction layer up.

I mean don't use etree as your internal representation. That should only be at the boundary if you can't make it work.

But why? I need to know why this makes the design simpler/better and/or my life easier.

friends-of-freeswitch / conffs

Figure out why we need to getitem #5

friends-of-freeswitch / conffs

Figure out why we need to __getitem__ #5

Figure out why we need to getitem #5