PDAL / PDAL

PDAL is Point Data Abstraction Library. GDAL for point cloud data.
https://pdal.io
Other
1.1k stars 438 forks source link

`PlyWriter::prepared` may cause `m_dims` to have more members than `m_dimNames` and to throw exception later. #2791

Closed yaobinwen-mvs closed 4 years ago

yaobinwen-mvs commented 4 years ago

Describe the bug

When PlyWriter::prepared is called more than once, the m_dims will have more members than m_dimNames which later causes an exception to be thrown when dereferencing a m_dimNames iterator ni here.

Reproduce the bug

Step 1: Check out the PDAL and its Python extension code. See the section below for the versions I'm using.

Step 2: Build and install PDAL according to the documentation.

Step 3: Build and install PDAL's Python extension according to the documentation. Note that python3 should be used.

Step 4: Run the following Python code, saved in a file called bug.py, to see the exception. I'm not sure if there is a way to reproduce it with the C++ API.

#!/usr/bin/env python3

# PDAL 2.0+ doesn't support Python 2 anymore.

from __future__ import (
    absolute_import, division, print_function, unicode_literals,
)

import pdal

pipeline_def = '''
{
    "pipeline": [
        {
            "type": "readers.ply",
            "filename": "simple_text.ply"
        },
        {
            "type": "writers.ply",
            "filename": "demo.ply"
        }
    ]
}
'''

def main():
    # Create and validate the pipeline.
    pipeline = pdal.Pipeline(pipeline_def)
    pipeline.validate()
    pipeline.execute()

if __name__ == '__main__':
    main()

Actual result

The following exception and call stack was printed:

Traceback (most recent call last):
  File "./bug.py", line 36, in <module>
    main()
  File "./bug.py", line 32, in main
    pipeline.execute()
  File "/usr/local/lib/python3.6/dist-packages/PDAL-2.2.2-py3.6-linux-x86_64.egg/pdal/pipeline.py", line 41, in execute
    return self.p.execute()
  File "pdal/libpdalpython.pyx", line 167, in pdal.libpdalpython.PyPipeline.execute
RuntimeError: basic_string::_M_create

Note: I'm not sure why, but some other times the exception was a bad_alloc:

Traceback (most recent call last):
  File "./main.py", line 41, in <module>
    main(**vars(_syntax().parse_args()))
  File "./main.py", line 37, in main
    count = pipeline.execute()
  File "/usr/local/lib/python3.6/dist-packages/PDAL-2.2.2-py3.6-linux-x86_64.egg/pdal/pipeline.py", line 41, in execute
    return self.p.execute()
  File "pdal/libpdalpython.pyx", line 167, in pdal.libpdalpython.PyPipeline.execute
MemoryError: std::bad_alloc

My analysis

In the Python code, pipeline.validate() calls PipelineExecutor::validate() which further calls m_manager.prepare();. This finally calls Stage::prepared(). This is the first time PlyWriter::prepared is called, so m_dimNames.size() is zero and the else branch is called. After the call, both m_dims and m_dimNames have 3 elements (X, Y, and Z).

Later, when pipeline.execute() is called, Stage::prepared is finally called again. This time, m_dimNames.size() is 3 so the true branch of the if is called, resulting m_dims to be pushed back three dimensions again. After the call, m_dims has 6 elements but m_dimNames still 3 elements.

Later, PlyWriter::writeHeader is called from within PlyWriter::ready. m_dimNames's iterator ni is assgined to its beginning, but it's m_dims that gets iterated through, and ni is dereferenced inside the for loop:

void PlyWriter::writeHeader(PointLayoutPtr layout) const
{
    auto ni = m_dimNames.begin();
    for (auto dim : m_dims)
    {
        std::string name = *ni++;
        // ...
    }
    // ...
}

Because m_dims has 6 elements but m_dimNames has only 3, as I have analyzed above, ni is incorrectly dereferenced when it already points at the end of m_dimNames, which results in the exception.

Expected behavior

The Python demo code calls both validate() and execute(), so I assume calling both should be allowed, and PlyWriter::prepared should make sure m_dims and m_dimNames have the same length no matter how many times this method is called.

I've read through the PDAL documentation, especially the Making a Stage (Reader, Filter or Writer). The document there doesn't require Stage::prepared have no side effects when called multiple times, nor did I read anywhere else that requires this.

System/installation information:

PDAL: I'm using the latest master on Oct. 7th, 2019, at this commit: 2033fdd8dcbd54551a8e1646939155590d277888

PDAL Python: I'm using the latest master on Sep 19, 2019, at this commit: PDAL/python@0da1fb7dcba0e006cda4700048dac8dfba556548

I'm using Ubuntu 18.04.

$ uname -a
Linux ywen-Precision-M4800 5.0.0-32-generic #34~18.04.2-Ubuntu SMP Thu Oct 10 10:36:02 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
$ pdal --version
--------------------------------
pdal 2.0.0 (git-version: 2033fd)
--------------------------------

I didn't install it via Conda. I built it by myself with cmake and make.

yaobinwen-mvs commented 4 years ago

I tried several other writers:

writers.bpf                  "Binary Point Format" (BPF) writer support. BPF is a simple DoD and research format that is used by some sensor and processing chains.
writers.gdal                 Write a point cloud as a GDAL raster.
writers.gltf                 Gltf Writer
writers.las                  ASPRS LAS 1.0 - 1.4 writer. LASzip support is also available if enabled at compile-time. Note that LAZ does not provide LAS 1.4 support at this time.

They don't have this problem. I haven't tried all the other available writers.