doctrine / phpcr-odm

Doctrine PHPCR ODM
http://www.doctrine-project.org/projects/phpcr-odm.html
MIT License
183 stars 97 forks source link

Support nested associative arrays #417

Open dantleech opened 10 years ago

dantleech commented 10 years ago

It should be possible to have an array type which automatically creates a structure, e.g

class Site {
    /**
     * @PHPCR\AssocArray()
     */
     protected $preferences;
}

Then automatically map that to a PHPCR structure:

/site
    preferences/
         - jcr:primaryType  = phpcr:associative_array
         - key1 = value 
         key2/
             - key3: = value
             - key4: = value
             key5/
dbu commented 10 years ago

can php array keys be anything else than strings? if not, that should actually be doable. i think we would need to find a better name, as we already have multivalue with assoc=true which is used for flat key-value hashmaps. something like NestedArray?

but anyways, lets not start this before the cmf 1.1 is released.

lsmith77 commented 10 years ago

php array keys can be integers and strings, nothing else

ElectricMaxxx commented 10 years ago

I would suggest an handling equal to the type setting in the JMSSerialzerBundle. It should be important to keep strings for the key to set them as localnames by default.

If the phpcr-odm can handle this type setting it should be possible to persist complete document-structures in an equal way We serialize objects to XML for example. The only difference should be: the structures have to contain Nodes as Wrapper for other properties.

I think this is a little bit more than the issue started, but goes in the same way. Cause if We start to add deeper array there can be a way of setting deeper documents instead of reference them as children. This can be a Second way.

ElectricMaxxx commented 10 years ago

Oh i wrote nonesense. But can not delete it at the momentfrom my mobile phone.

Thought he wants to set the deeper arrays as real child documents.

danrot commented 10 years ago

On which level would this feature be implemented? Only in the ODM, or would this already be placed in the PHPCR-Specification?

dbu commented 10 years ago

Only odm, the phpcr spec is linked to jcr. And its not a basic thing, its just syntactic sugar to automatically store data without objects.

----- Reply message ----- Von: "Daniel Rotter" notifications@github.com An: "doctrine/phpcr-odm" phpcr-odm@noreply.github.com Cc: "David Buchmann" david@liip.ch Betreff: [phpcr-odm] Support nested associative arrays (#417) Datum: Di., Feb. 11, 2014 08:05 On which level would this feature be implemented? Only in the ODM, or would this already be placed in the PHPCR-Specification?

— Reply to this email directly or view it on GitHub.

dantleech commented 10 years ago

Could we not do this as a BC break? As it would otherwise overlap with the functionality of assoc=true

dbu commented 10 years ago

we could add another thing like assoc, calling it nested and then it would persist differently, but its still an attribute field and not a children mapping in phpcr-odm terms.

dantleech commented 10 years ago

Not sure mapping it as an attribute field would make sense .. i.e. @Integer(nested=true), @String(nested=true). Would we then cast all the elements as integer or string depending on the field type?

I think a dedicated field type would make more sense, (like Doctrines @Field(type=phparray))

We could then either unconditionally map all the property values as string or dynamically cast them.

dbu commented 10 years ago

yeah you are right, a field like @Array resp @Field(type=array) sounds good. afaik we call the other thing multivalue, and multivalue is just that, while array maps to phpcr nodes.

ElectricMaxxx commented 10 years ago

Is it possible to get an @Field(type=<integer,string>) to get valdation on keys and values for their types? Sounds cool for Annotation, but imposible for xml config. By this it should be easy to go one Level deeper and so on.

ElectricMaxxx commented 10 years ago

Oh my signs are gone. Ment @Field(type=<integer,string>)

dbu commented 10 years ago

if we don't specify it, the types will be autodetermined by phpcr and come back in the right type on loading (string / integer / float / boolean)

uwej711 commented 10 years ago

Recently I had some issues with that when there were null values in an array (using assoc=true) forcing me to remove all null velues before persisting.

dbu commented 10 years ago

uh, i think indeed in phpcr setting something to null means removing it. so your problem was that the keys get shifted because values disapear when reloading the property? maybe we should fix persisting assoc arrays to clean null values before storing the keys and values. otherwise we would need to store the indexes that where null in a 3rd field and on restoring the array put them back in the right places. that sounds tricky, but would probably the least wtf. do you want to open a separate issue for this? can we address this in like the next couple of days to get it into the 1.1 release? there is potential BC break here, if people somehow worked around the bogus behaviour.

dantleech commented 10 years ago

Does JCR store property type in the property (i.e. so something that is dropped in PHPCR) or is it the responsability of the nodeType definition?

Having metadata on the properties sounds awkward indeed.

dbu commented 10 years ago

@dantleech the type of a property is stored in the property itself (the type is just there to enforce what type a property may have). i think uwe was talking about something different - null values are simply not stored in phpcr. assuming the 'test' property exists, $node->setPropertyValue('test', null) is the same as $node->getProperty('test')->remove() . this seems to hold true for arrays as well (though only in jackalope-jackrabbit probably.)

dantleech commented 10 years ago

@dbu so the type is stored with the property? Looking at a PHPCR dump I don't see any data other than the value on the property, I thought it was dynamically inferred.

If the type is stored with the property, can we not add a custom type for null?

dbu commented 10 years ago

@dantleech look at an export in the system view, you will see the type. the types of properties are a limited set, there are no custom types and there is no type for null - the doc is quite explicit on the semantics of null: http://www.day.com/specs/jcr/2.0/10_Writing.html#10.4.2.4%20No%20Null%20Values

uwej711 commented 10 years ago

@dbu: I can't remember from heart but I think Jackalope or phpcr-utils throw an exception somewhere when you try to store an array with assoc=true and null values.

dbu commented 10 years ago

i can't store null values. what i get is

[PHPCR\ValueFormatException]
Can not determine type of property with value "NULL"

not an extremly helpful message, but at least it does not work to create messed up data.

dantleech commented 10 years ago

Shall we do this for 1.1? I might be able to look at it at the weekend if nobody else is interested.

dbu commented 10 years ago

you mean the validation? that would be nice. for the original topic of nested associative arrays, i am -1 as this is a totally new thing with quite some effort involved, and we really need to get 1.1 out and the cmf bundles 1.1 out. we are behind the roadmap by almost 2 months...

uwej711 commented 10 years ago

This would also need some special handling for null values in the array, as properties with null values are not stored by phpcr, so you will loose the keys!

dbu commented 10 years ago

see #457 for how uwe solved this for the hashmaps case - the solution here would probably look similar.

dantleech commented 10 years ago

Hmm. Just had a look at this and one issue is how or if we should handle translations here?

dbu commented 10 years ago

i would handle them for the whole property, as with non-nested arrays. if the property is translated, the whole thing is translated. if you use children translation strategy, there is no problem. for attribute strategy, if the idea is to create subnodes, we should use the same naming schema as for properties for the top level node of the structure.

dantleech commented 10 years ago

ok, i thought the same but it struck me as being maybe non-trivial - would this open a box of refactorings do you think?

dbu commented 10 years ago

maybe the assumptions of the parameter to a translation strategy needs to be altered a bit - maybe not even that, not sure. apart from that i expect it to be rather straightforward.

dantleech commented 9 years ago

I just wrote a an event listener class for SuluCMF which does this. It uses a dot-delimited property name to store multi-dimensional arrays.

Name Type Vaue
foo.bar.bar STRING Value 1
foo.boo.baz LONG 1234
foo.car STRING Value 2

Would give

array(
    'foo' => array(
        'bar' => => array(
            'bar' => 'Value 1',
            'baz' => 1234,
        ),
        'car' => 'Value 2',
    ),

Will try and port it here.

dbu commented 9 years ago

so you use a child node for the array, but only one instead of a nested node tree? i guess that works, but why not go with nested nodes while you are at it? this scheme would break if i do

array(
    'foo.bar' => 'x',
);
dantleech commented 9 years ago

No everything is in the same node in this strategy.

Regarding the breaking of the scheme, if we rewrote the entire array every time it shouldn't be a problem I think.

dbu commented 9 years ago

if everything is in the same node, does that mean there can only be one array? and how do you know what belongs into the array or not?

with rewriting, you mean encode . in hashmap keys with something else?

dantleech commented 9 years ago

No, say you have a property foo which is mapped as an array in PHPCR-ODM.

If you set its value to array('foo' => array('bar' => 'baz'), 'car' => array('boat', 'plane')) then the following properties would be created in the node:

foo.foo.bar STRING "baz"
foo.car.0 STRING "boat"
foo.car.1 STRING "plane"

We would know it is an array because there would be a mapping indicating that it is so.

dantleech commented 9 years ago

with rewriting, you mean encode . in hashmap keys with something else?

Ah, yeah I see what you mean. We would need to encode/escape the delimiter.

dbu commented 9 years ago

ah ok. i think your first example was missing the property name, that got me confused ;-)

and then to rebuild the array, you would scan the node for all properties starting with "foo.". but be aware that properties do not have guaranteed order, only nodes can have that. this means we can not guarantee to rebuild the exact same php array. but this is no argument against your architecture, as the same problem holds for nested nodes for child arrays.

the order would only be guaranteed if we would use a single multivalue field with a list of keys and a second one with a list of values. but then it would become unsearchable again. we might use a separate multivalue property just for the order.

dantleech commented 9 years ago

Hmm yeah. But is order really important with associative arrays?

In the case of non-associative arrays they would have indexes so would be orderable. I think unordered associative arrays is an acceptable compromise (although if pushed we could store the order as you say).

dbu commented 9 years ago

well, i could do a foreach on the assoc array... i think while we are at it we should make it clean, or we risk unexpected things. the separate multivalue field for the order should be ok. you could then restore the array by doing foreach on that multivalue field and get the values out of the property values array of the node.

dantleech commented 9 years ago

maybe we could make it an option?

/**
 * @PHPCR\Array(ordered=true)
 */
protected $options;

Storing the order would increase the node size, and this strategy is already pretty inefficient in terms of size -- although perhaps this is a moot point anyway as we need to store the keys in order to have the null values.

Also .. this could replace the existing array implementation(s) what do you think?

dantleech commented 9 years ago

(regarding the null values we could also represent them as a string e.g. __null__ and make sure that that value is escaped when the user uses it) /cc @uwej711

dbu commented 9 years ago

i would not make keeping the order optional - keep complexity manageable. if anything, we could support either expanding the way you do, or serializing an array (if you are concerned about efficiency).

and yes, this should probably replace what uwe did, no need to have two slightly different things for the same. this however is a BC break for existing data...

can you search the same way as with the uwe-solution? right now if i had a field $f, it was f = 'foo', now it would be f.* = 'foo' right?

dantleech commented 9 years ago

I guess for searching we should make it transparent (and therefore maintain BC). I guess we need to do some research about the best way to implement the search syntax/API. But I think the basic case f = 'foo' should be the same.

dbu commented 9 years ago

in the query builder, yes (though, you could also want to search for a specific key?)

i was thinking about SQL2 queries too, there must be a representation that works.

dantleech commented 9 years ago

well, I guess SQL2 is to PHPCR-ODM what SQL is to Doctrine ORM.. i.e. it breaks the abstraction so you shouldn't really use it to query for documents.

dantleech commented 9 years ago

Had a look at this. Basically there is a pain point with the translation strategies because the they would need to set a property value for each element in the array instead of a single value. So we would probably have to push more responsiblity into those (although I guess it could be handled in an abstract class and we would have a "serialization" helper).

dbu commented 9 years ago

for child strategy, we need not do anything at all, as each translation has its own node. for attribute, we should probably just translate each property, if you go with the approach of splitting each array entry into its own property.