Multi-input pipelines don't construct properly

hobu commented 2 years ago

@gsakkis Complex pipelines with branches in them don't seem to construct properly with the bindings. It could be my usage is incorrect or not being provided as expected, or the bindings don't support branched pipelines as currently written. Can you take a look?

Expected output


Before => reproject_before \ 
                             => merge
After => reproject_after   /

Produced output

But I end up with

Before => before_reprojection \
                                => after_reprojection => merge
After =>                      /

Example code


before = pdal.Reader.las("before.las")
after = pdal.Reader.las("after.las")

reproject_before = pdal.Filter.reprojection(out_srs="EPSG:26915")
reproject_after = pdal.Filter.reprojection(out_srs="EPSG:26915")

merge = ((before |reproject_before) | (after | reproject_after)) | pdal.Filter.merge()

print (merge.pipeline)

Example output

{
  "pipeline":
  [
    {
      "filename": "before.las",
      "tag": "readers_las1",
      "type": "readers.las"
    },
    {
      "inputs":
      [
        "readers_las1"
      ],
      "out_srs": "EPSG:26915",
      "tag": "filters_reprojection1",
      "type": "filters.reprojection"
    },
    {
      "filename": "after.las",
      "tag": "readers_las2",
      "type": "readers.las"
    },
    {
      "inputs":
      [
        "filters_reprojection1",
        "readers_las2"
      ],
      "out_srs": "EPSG:26915",
      "tag": "filters_reprojection2",
      "type": "filters.reprojection"
    },
    {
      "inputs":
      [
        "filters_reprojection2"
      ],
      "tag": "filters_merge1",
      "type": "filters.merge"
    }
  ]
}

gsakkis commented 2 years ago

@hobu I took a look; some notes:

Currently the pdal-python pipe operator all it does is concatenate the stage(s) of the piped stages/pipelines. That is, (before | reproject_before) | (after | reproject_after) is equivalent to before | reproject_before | after | reproject_after. Do you expect these to have different semantics in general and if yes how?
A side-effect of doing a simple stage concatenation and nothing else is that pdal-python does not set implicitly the inputs of any stage, it relies on pdal-core to determine the inputs according to the algorithm described here. This is why the inputs of reproject_after is determined as ["filters_reprojection1", "readers_las2"].
Regardless of the previous points, a pipeline such as p1 | p2 | pdal.Filter.merge() (where p1 = before | reproject_before and p2 = after | reproject_after) does not (and cannot) infer that the inputs of merge are p1 and p2. The only reasonable semantics of this pipeline is that the input of p2 is p1 and the input of merge is p2.

I can think of two ways to support multiple inputs:
- Specify them explicitly: p1 | p2 | pdal.Filter.merge(inputs=["reproject_before", "reproject_after"]) This should have been working today but it doesn't (due to (1) and/or (2)).
- Introduce a new syntax to support multiple inputs, for example allow the left hand side to be a list/tuple: (p1, p2) | pdal.Filter.merge() This needs some more thought and could be addressed as a separate feature.

hobu commented 2 years ago

The second syntax looks easier to read, but can it actually be made to work as we expect?

I guess I would be happy with the first proposed syntax working even if it is verbose to write.

gsakkis commented 2 years ago

Regardless of the syntax for multiple inputs, the first thing to be decided is the (1) above: should the pipe operator be (left) associative or not? In other words is

[a]    (reader1 | filter1) | (reader2 | filter2)

equivalent to

[b]    (((reader1 | filter1) | reader2) | filter2)

(where [b] can be simplified to reader1 | filter1 | reader2 | filter2)?

If yes, what's the implied inputs of filter2? According to the pdal core algorithm (which pdal-python inherits), it is [filter1, reader2].
If not, do we want the implied inputs of filter2 to be [reader2] in [a] and [filter1, reader2] in [b]?

PDAL / python