PDAL / python

PDAL's Python Support
Other
117 stars 35 forks source link

Pipeline terminates with pdal::pdal_error for filters.covariancefeatures #176

Open G-Anjanappa opened 1 month ago

G-Anjanappa commented 1 month ago

Hello,

I am using parallel processing (Dask) to generate covariance features for a batch of point clouds. It is a part of a larger pipeline that includes various other steps like DBSCAN and CSF.

Occasionally, the pipeline fails with the following error: Terminate called after throwing an instance of 'pdal::pdal_error' what(): filters.covariancefeatures: Cannot perform eigen decomposition.

The pipelines run successfully for the failed files when tested individually.

How can I catch pdal::pdal_error exception in Python? RuntimeError doesn't seem to work.

Thank you for your assistance.

abellgithub commented 1 month ago

Are you saying that you're getting this error for the SAME dataset that succeeds when not running using DASK?

G-Anjanappa commented 1 month ago

Yes, that's correct! The error occurs specifically when using Dask, but the pipeline runs successfully on the same dataset when tested individually using the inline pipeline or the JSON format pipeline.

For your reference, here is a part of my Dask pipeline:

pipeline = (
    pdal.Reader('input.laz') |  
    pdal.Filter("filters.csf", resolution=0.5, threshold=1, iterations=200) |
    pdal.Filter("filters.optimalneighborhood") | 
    pdal.Filter("filters.covariancefeatures", knn=10, optimized=True,
                feature_set="Verticality, Linearity, SurfaceVariation, Scattering")
)

I am using this pipeline for over 1500 files, and it works fine for the majority of them. The issue only occurs with a very small subset of the files.

hobu commented 1 month ago

The PDAL Python bindings currently release the GIL. We are also using Dask + PDAL Python for SilviMetric, but we are not using these filters. I suspect these filters are not thread safe because they have internal state that's being managed.

The solution is probably an option being added to the bindings that prevents them from releasing the GIL. It is not realistic to check every filter and make them thread safe.

G-Anjanappa commented 1 month ago

Thank you for the explanation.

Would you recommend any other approaches or workarounds that I can use to avoid these errors with Dask? Or is sequential processing for the failed files the only practical and quick solution in this case?

abellgithub commented 1 month ago

If this is indeed a thread-safety issue, there's going to be nothing special about the files that didn't work -- just because a run with the files failed once doesn't mean it will fail again. If you're seeing consistent failure behavior with specific datasets then something else is going on and perhaps sharing a couple of datasets would be in order.

hobu commented 1 month ago

You aren't by chance running Numpy 2 are you? I'm noticing some tests failing with Numpy 2 that appear to be threading or gil-release related. I don't have it figured out quite yet though.

hobu commented 1 month ago

2.1, rather. I don't have any issue on 2.0

G-Anjanappa commented 1 month ago

If this is indeed a thread-safety issue, there's going to be nothing special about the files that didn't work -- just because a run with the files failed once doesn't mean it will fail again. If you're seeing consistent failure behavior with specific datasets then something else is going on and perhaps sharing a couple of datasets would be in order.

I did retry processing the files multiple times. For some of the files that initially failed, they did work on subsequent retries, which aligns with what you mentioned about potential thread-safety issues. However, there are still a few files that consistently fail regardless of the number of retries.

G-Anjanappa commented 1 month ago

You aren't by chance running Numpy 2 are you? I'm noticing some tests failing with Numpy 2 that appear to be threading or gil-release related. I don't have it figured out quite yet though.

No, I am using Numpy version 1.26.4. PDAL is 2.7.1