adriank / ObjectPath

The agile query language for semi-structured data
http://objectpath.org
MIT License
380 stars 93 forks source link

[Feature request] Named selector and removal operator #54

Closed felixhao28 closed 6 years ago

felixhao28 commented 7 years ago

For a family like this (the actually data can go very deep):

name: Adam
gender: male
children:
  - name: Bran
    gender: male
  - name: Cindy
    gender: female
    children:
      - name: David
        gender: male
        children:
          - name: Helen
            gender: female
      - name: Eva
        gender: female
  - name: Frank
    gender: male
    children:
      - name: George
        gender: male

I'd like to find Cindy and all female offsprings of Cindy. There will be three steps to take:

  1. Find Cindy: $..*[@.name is 'Cindy'].name => Cindy
  2. Find all female offsprings of Cindy: $..*[@.name is 'Cindy'].children..*[@.gender is female].name => Helen, Eva
  3. Add them together: $..*[@.name is 'Cindy'].name + $..*[@.name is 'Cindy'].children..*[@.gender is female].name => Cindy, Helen, Eva

And now you can see the pattern is already unbearable long due to repetition of finding Cindy pattern. In more complex cases, repetition will make the pattern practically unmaintainable. Currently in my application, I am using a following pattern (with a self-written preprocessor):

$cindy := $..*[@.name is 'Cindy']
$cindy.name + $cindy.children..*[@.gender is female].name

This expands to $..*[@.name is 'Cindy'].name + $..*[@.name is 'Cindy'].children..*[@.gender is female].name.

Another thing I am looking for is the ability to delete certain objects from collection. Suppose I need to find Cindy and all female offsprings of Cindy except David's children. I don't know if it is even possible at the current stage. Since we have '+' for union, I propose use '-' for subtraction.

(The assign operators are aligned only for aesthetics reasons)

$cindy           := $..*[@.name is 'Cindy']
$cindy_bloodline := $cindy.name + $cindy.children..*[@.gender is female].name
$david_bloodline := $..*[@.name is 'David'].children..*.name
$cindy_bloodline - $david_bloodline
adriank commented 7 years ago

Hi Felix,

Re verbosity of the language: In your case, the solution would be to use OP from Python. I didn't add variables etc to language because it's out of the scope of OP - to query data. What I would recommend doing is to write a simple Python script like:

from objectpath import tree=Tree(obj) cindy = Tree(tree.execute("$..[@.name is 'Cindy']")) cindy_bloodline = cindy.execute("$.children..[@.gender is female].name") david_bloodline = cindy.execute("$.children..[@.name is 'David'].children..*.name")

If a family tree is large the above code is much more efficient because recursive descent operator (..) traverses the whole tree only once and then the algorithm works on a smaller subtree. Remember that when there is more than one Cindy in a tree you'll merge children of all Cindis in the tree which may not be your intention. Python can solve also this issue, while pure OP can't.

Re subtraction operator: That's certainly a good idea. Could you please extract this feature request to another issue? I'll add this functionality when I'll find some spare time.

Greetings, Adrian Kalbarczyk

http://kalbarczyk.co

On Sun, Jul 23, 2017 at 1:02 PM, Felix Hao notifications@github.com wrote:

For a family like this (the actually data can go very deep):

name: Adamgender: malechildren:

  • name: Bran gender: male
  • name: Cindy gender: female children:
    • name: David gender: male children:
      • name: Helen gender: female
    • name: Eva gender: female
  • name: Frank gender: male children:
    • name: George gender: male

I'd like to find Cindy and all female offsprings of Cindy. There will be three steps to take:

  1. Find Cindy: $..*[@.name is 'Cindy'].name => Cindy
  2. Find all female offsprings of Cindy: $..[@.name is 'Cindy'].children..[@.gender is female].name => Helen, Eva
  3. Add them together: $..[@.name is 'Cindy'].name + $..[@.name is 'Cindy'].children..*[@.gender is female].name => Cindy, Helen, Eva

And now you can see the pattern is already unbearable long due to repetition of finding Cindy pattern. In more complex cases, repetition will make the pattern practically unmaintainable. Currently in my application, I am using a following pattern (with a self-written preprocessor):

$cindy := $..[@.name is 'Cindy'] $cindy.name + $cindy.children..[@.gender is female].name

This expands to $..[@.name is 'Cindy'].name + $..[@.name is 'Cindy'].children..*[@.gender is female].name.

Another thing I am looking for is the ability to delete certain objects from collection. Suppose I need find Cindy and all female offsprings of Cindy except David's children. I don't know if it is even possible at the current stage. Since we have '+' for union, I propose use '-' for subtraction.

(The assign operators are aligned only for aesthetics reasons)

$cindy := $..[@.name is 'Cindy'] $cindy_bloodline := $cindy.name + $cindy.children..[@.gender is female].name $david_bloodline := $..[@.name is 'David'].children...name $cindy_bloodline - $david_bloodline

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/adriank/ObjectPath/issues/54, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKyckSuW7NaaBa3OZnw-q9DXoC0fJaZks5sQyhbgaJpZM4OgbB0 .

felixhao28 commented 7 years ago

recursive descent operator (..) traverses the whole tree only once

So there is no caching mechanism? Since ObjectPath is a strictly immutable language, I assumed reusing the same pattern will retrieve the cached result instead of re-evaluating.

There's another issue when using Python to store the results: how do you manage multiple result collections, like A + B?

Could you please extract this feature request to another issue?

Sure