Add `component` and `complex` to Node

thomasarsouze commented 2 years ago

This would align with MTG implementation OpenAlea and is a first step toward visit only some scales without the need to visit all nodes in-between.

This means also updating names: children become nodes of the same scale, components of a scale with higher number and complex are nodes of a scale with lower number.

VEZY commented 1 year ago

For clarity I'll say higher scale for lower-value scales (the coarser scale), and lower scale for nodes with a high value scale (the finer scale)

The benefit of storing component and complex is not clear to me. I think it depends more on the structure of the MTG itself, i.e. how many nodes there are in total, and how many there are in each scale.

component

Consider this little mtg as a first example:

/ 1: P
└─ / 2: A
   └─ / 3: GU
      ├─ / 4: N
      │  └─ < 5: N
      │     └─ < 6: N
      │        └─ < 7: N
      │           └─ < 8: N
      │              └─ < 9: N
      │                 ├─ + 134: Leaf
      │                 └─ < 10: N
      │                    ├─ + 133: Leaf
      │                    └─ < 11: N
      │                       ├─ + 132: Leaf
      │                       └─ < 12: N
      │                          ├─ + 131: Leaf
      │                          └─ < 13: N
      │                             ├─ + 130: Leaf
      │                             └─ < 14: N
      │                                └─ + 129: Leaf
      └─ < 15: GU
         └─ / 16: N
            └─ < 17: N
               ├─ + 203: Leaf
               └─ < 18: N
                  └─ + 202: Leaf

This mtg has one P, one axis, two GU, and a lot of nodes and leaves. If you want to visit a GU, of course you don't want to visit the whole thing. In this case having component could be nice. You also have a lot of leaves but they are close to the end of GU, so you don't want to visit all Ns. And if you want to visit the N, you want to avoid visiting the leaves.

Now if we remove the leaves, we get this mtg instead:

/ 1: P
└─ / 2: A
   └─ / 3: GU
      ├─ / 4: N
      │  └─ < 5: N
      │     └─ < 6: N
      │        └─ < 7: N
      │           └─ < 8: N
      │              └─ < 9: N
      │                 └─ < 10: N
      │                    └─ < 11: N
      │                       └─ < 12: N
      │                          └─ < 13: N
      │                             └─ < 14: N
      └─ < 15: GU
         └─ / 16: N
            └─ < 17: N
               └─ < 18: N

Visiting the lower scale N is always expensive, regardless of the number of nodes with a higher scale (P, A and GU). So there's probably a very little gain in not visiting the higher scales, or in other words, there's a marginal cost of visiting higher scales when visiting lower scales. But, there is a large cost of storing component and complex for all nodes. However, there is a very high cost of visiting all nodes when we only want higher scales. So it would be interesting to be able to visit P, A or GU only.

complex

Lastly, about the complex, I feel that visiting a higher scale is always kind of cheap, but storing the complex for every node is very expensive (unless you do it optionally, but it would make the code very complicated). In any case visiting the higher scale should take a handful of nodes only. And if it isn't the case, it is probably that the MTG is ill defined, and there is a missing intermediate scale. We can take the example of the MTG from plantscan3d, which is made of one P and many N. We should add a scale for the axis there.

What I propose

So I have this idea of an alternative to component and complex in the MTG. What if we store in the root node an array that references (i.e. a pointer) all nodes in a scale, but it would be optional. We could call it a cache, as we do with other variables when using descendants!. And tell users that if they plan to visit a lot of times a scale, they should probably cache the scale. To do so we would need:

[ ] a function called e.g. cache_scale that would do it automatically on an existing MTG
[ ] a function to clear the cache also, in case we want a smaller mtg. This function should be different than clear_cache, and shouldn't be impacted by it. Maybe rename into clear_variable_cache ? And the other would be clear_scale_cache.
[ ] an argument in the read_mtg function to build the cache on parsing (much more efficient that re-visiting the whole thing afterwards)
[ ] Include an automatic detection of the presence of the cache for a set of scale(s) or symbol(s)
[ ] Use the cache when visiting the MTG (only on the first call obviously, not inside a recursion). Keep in mind that if the user requests more than one scale and/or symbols, we have to use them all.
[ ] Use the cache in ancestors too. In this one it is more complex, because we don't have access to the root node in a fast way. So we could probably ask the user to give the root node as an argument. Then, we have to identify which ancestor we are searching for in the array of its scale, so we have to visit all ancestors until we find a node with the set of symbol(s) or scales, and then go and find it in the cache, and visit all previous others. If there are multiple symbols and scales, we have to continue visiting all ancestors for the others too until finding all.
[ ] Use the cache in descendants. In this one too, we should ask the user to give the root node as an argument to make things fast. And we'll get the arrays for each scale from here.

VEZY commented 1 year ago

Instead of a cache, we could also make a new mtg with only the scales required, and each node is a view into the node of the full MTG so it would be pretty cheap to store. We could even add a method to view for MTGs. And example usage would be:

mtg_2_3 = view(mtg, scale = [2,3])

Edit: This is a bad idea, how can we traverse this new mtg view then ? Because if each node is a node from another mtg, its parents and children would still be the ones from the original mtg. Better use a vector of nodes as proposed earlier.

VEZY / MultiScaleTreeGraph.jl