ZettaAI / zetta_utils

MIT License
10 stars 0 forks source link

handling of multi-chunk lines in AnnotationLayer #789

Open JoeStrout opened 3 weeks ago

JoeStrout commented 3 weeks ago

The precomputed file format specification says that line annotations which span multiple chunks should be written to every chunk they intersect.

While the current AnnotationLayer class (in this PR) supports this, it would be up to the caller to do it properly. There is no built-in support for handling this automatically. Moreover, doing it properly would be very difficult in the case of a subchunkable flow, where each chunk may be processed independently and in parallel; in this case, a node processing chunk A really has no good way to write to chunk B, even if it has found a line that spans both. It would work only in cases where the same line is found by both chunk's processors (e.g., due to padding and a reliable, deterministic process for finding lines). And even in the latter case, we'd end up with two line annotations with different IDs, sharing spatial coordinates.

It would be better if AnnotationLayer could handle this for us. This would mean:

  1. When lines are written to a chunk, any lines which extend outside that chunk are collected somewhere.
  2. A later "blend" process (probably in AnnotationLayer.post_process) finds all the overflow lines which intersect each chunk, reconciles the overlaps by some strategy (e.g. keeping the lowest or highest ID), and adds them to the chunk.

Note that to "intersect" a chunk should mean that the line passes through it, even if neither endpoint of the line is actually within the chunk. (We might want to add a line-box intersection test to BBox3D to make this easier.)