OpenNavigationSurface / BAG

The Bathymetric Attributed Grid library
BSD 3-Clause "New" or "Revised" License
20 stars 13 forks source link

How do you open older .BAG files? #90

Open heathhenley opened 8 months ago

heathhenley commented 8 months ago

I'm looking to read in some surveys and I'm exploring using the your python bindings here to do it. I came across this xml parse error (Extra content at the end of the document) trying to read .BAGs from H12025 and W00426.

I'm assuming there's some error with my set up or understanding of how to use it, as I see the same behavior with the sample-1.5.0.bag in the repo's examples:


>>> import bagPy
>>> dataset = bagPy.Dataset.openDataset(r"C:\Users\heath\Desktop\BAGS\bag_repo_samples\sample.bag", bagPy.BAG_OPEN_READ_WRITE)
>>> dataset = bagPy.Dataset.openDataset(r"C:\Users\heath\Desktop\BAGS\bag_repo_samples\sample-2.0.1.bag", bagPy.BAG_OPEN_READ_WRITE)
>>> dataset = bagPy.Dataset.openDataset(r"C:\Users\heath\Desktop\BAGS\bag_repo_samples\sample-1.5.0.bag", bagPy.BAG_OPEN_READ_WRITE)
Entity: line 17: parser error : Extra content at the end of the document
erNote></gmd:MD_SecurityConstraints></gmd:metadataConstraints></gmi:MI_Metadata>
                                                                               ^
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\heath\miniconda3\envs\bag_test_env\Lib\site-packages\bagPy\__init__.py", line 9401, in openDataset
    return _bagPy.Dataset_openDataset(fileName, openMode)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
bagPy.ErrorLoadingMetadata

If anyone can help me understand what I've missed I would appreciate it.

I'm seeing the same error with the real data, but not failing on the same xml tag, for example:

>>> dataset = bagPy.Dataset.openDataset(r"C:\Users\heath\Desktop\BAGS\W00426_MB_4m_MLLW_1of3.bag", bagPy.BAG_OPEN_READ_WRITE)
Entity: line 29: parser error : Extra content at the end of the document

^
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\heath\miniconda3\envs\bag_test_env\Lib\site-packages\bagPy\__init__.py", line 9401, in openDataset
    return _bagPy.Dataset_openDataset(fileName, openMode)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
bagPy.ErrorLoadingMetadata

I am able to parse and manipulate both of those BAGs with the tools in https://github.com/hydroffice/hyo2_bag - so that's an option too, but this package just seemed to be more actively maintained and documented.

selimnairb commented 7 months ago

Hi @heathhenley thanks for posting this. This looks like a bug. I'll have a look over the next few working days and get back to you.

selimnairb commented 5 months ago

@heathhenley I finally had a chance to look at this, sorry for the delay. Can you tell me what version of Python you are using? When I open examples/sample-data/sample-1.5.0.bag using Python 3.11 on my Mac, it opens fine. I'm away from the office right now, so I don't have my Windows machine to test on, but wanted to see if this was a Python version issue in the meantime. Thanks!

heathhenley commented 5 months ago

Thanks for looking into this! I'm still not convinced that I didn't just miss something, I don't usually use conda so I'm winging it there. I know at least in experience, windows is always weird with open source gis tools (gdal etc) too.

I must have torched my set up from last time, so I set up to try it again today on windows 11, here's what I did:

I can run the tests, but I get 4 fails and an error, I haven't dug into them at all:

================================================================= short test summary info =================================================================
FAILED python/test_compat_gdal.py::TestCompatGDAL::test_gdal_create_simple - SystemError: _PyErr_SetObject: exception <class 'bagPy.MetadataNotFound'> is not a BaseException subclass
FAILED python/test_dataset.py::TestDataset::testGetLayerTypes - bagPy.ErrorLoadingMetadata
FAILED python/test_interleavedlegacylayer.py::TestInterleavedLegacyLayer::testGetLayerAndRead - bagPy.ErrorLoadingMetadata
FAILED python/test_simplelayer.py::TestSimpleLayer::testRead - bagPy.ErrorLoadingMetadata
ERROR python/test_compat_gdal.py::TestCompatGDAL::test_gdal_create_simple - PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\heath\\AppData\\Local\\Temp\...
=================================================== 4 failed, 91 passed, 20 warnings, 1 error in 1.54s ====================================================

But three of them maybe related to the original problem I had. There was a problem in the geodjango /gdal tests on windows related to files not being allowed to be opened more than once on windows (without being closed), mac/linux doesn't care, maybe that's what's going on with the permission error, I didn't dig in.

To actually answer your question I'm using 3.12.2:

(base) c:\dev\BAG>python
Python 3.12.2 | packaged by conda-forge | (main, Feb 16 2024, 20:42:31) [MSC v.1937 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import bagPy
>>> dataset = bagPy.Dataset.openDataset(r"C:\dev\BAG\examples\sample-data\sample-2.0.1.bag", bagPy.BAG_OPEN_READ_WRITE)
>>> dataset = bagPy.Dataset.openDataset(r"C:\dev\BAG\examples\sample-data\sample-1.5.0.bag", bagPy.BAG_OPEN_READ_WRITE)
Entity: line 17: parser error : Extra content at the end of the document
erNote></gmd:MD_SecurityConstraints></gmd:metadataConstraints></gmi:MI_Metadata>
                                                                               ^
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\heath\miniconda3\Lib\site-packages\bagPy\__init__.py", line 9401, in openDataset
    return _bagPy.Dataset_openDataset(fileName, openMode)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
bagPy.ErrorLoadingMetadata
stephen-patterson-noaa commented 3 months ago

Greetings, I was noticing the same issue as this open bug. I am using a Windows laptop and just trying to get familiar with the bagPy library by opening a file and I receive the following error: "Entity: line 33: parser error : Extra content at the end of the document" Some stackoverflow articles mentioned that XML needs a parent tag and it will throw that error if missing one.

Here is the code I was running: dataset = bagPy.Dataset.openDataset(bag_file_path, bagPy.BAG_OPEN_READ_WRITE)

I tried it with Python versions 3.8, 3.9, and 3.12. Seeing the same error with all of those conda environments.

stephen-patterson-noaa commented 3 months ago

Greetings, I was noticing the same issue as this open bug. I am using a Windows laptop and just trying to get familiar with the bagPy library by opening a file and I receive the following error: "Entity: line 33: parser error : Extra content at the end of the document" Some stackoverflow articles mentioned that XML needs a parent tag and it will throw that error if missing one.

Here is the code I was running: dataset = bagPy.Dataset.openDataset(bag_file_path, bagPy.BAG_OPEN_READ_WRITE)

I tried it with Python versions 3.8, 3.9, and 3.12. Seeing the same error with all of those conda environments.

I was able to open a 2022 BAG file, but nothing older than that as far as I know.

giumas commented 3 months ago

@selimnairb, I had a similar issue in Python using lxml to parse the xml in the BAG files. Given that lxml is wrapping libxml2, it is likely the same change in behavior that was introduced with a recent version of the library. The simple solution was to strip the retrieved string in order to remove the trailing characters.

selimnairb commented 3 months ago

@giumas @stephen-patterson-noaa Can you confirm whether you are seeing this error with any of the sample BAGs in examples/sample-data?

stephen-patterson-noaa commented 3 months ago

I tried to open each of the sample BAG files in examples/sample-data and view their layers.
Only 2 files(bag_georefmetadata_layer.bag, sample-2.0.1.bag) opened and printed the layer names:

bag_georefmetadata_layer.bag
     Elevation
     Uncertainty
     Elevation
example_w_qc_layers.bag
    - bagPy.ErrorLoadingMetadata
metadata_layer_example.bag
    - OSError: [Errno 0] Error
nominal_only.bag
    - bagPy.ErrorLoadingMetadata
sample-1.5.0.bag
    - bagPy.ErrorLoadingMetadata
sample-2.0.1.bag
     Elevation
     Uncertainty
     Nominal_Elevation
     Surface_Correction
true_n_nominal.bag
    - bagPy.ErrorLoadingMetadata
giumas commented 3 months ago

@selimnairb, this minimal script replicates the issue without using BagPy:

from h5py import File
from lxml import etree, __version__ as lxml_version

bag_path = r"C:\code\cpp\BAG\examples\sample-data\sample-1.5.0.bag"

strip_x00 = False

print("lxml version: %s" % lxml_version)
print("libxml version: %s" % (etree.LIBXML_COMPILED_VERSION, ))

bag = File(bag_path, 'r')
xml = bag["BAG_root/metadata"][:].tobytes()
if strip_x00:
    xml = xml.strip(b'\x00')
xml_tree = etree.fromstring(xml)

If I run it using old versions of libxml, it works as is (strip_x00 = False). Example output:

lxml version: 4.7.1
libxml version: (2, 9, 12)

However, the same script fails when executed in a more recent version of libxml:

lxml version: 5.1.0
libxml version: (2, 12, 3)
Traceback (most recent call last):
  File "C:\code\hyo2\hyo2_bag\examples\workground\open_bag_metadata_xml.py", line 15, in <module>
    xml_tree = etree.fromstring(xml)
               ^^^^^^^^^^^^^^^^^^^^^
  File "src/lxml/etree.pyx", line 3264, in lxml.etree.fromstring
  File "src/lxml/parser.pxi", line 1989, in lxml.etree._parseMemoryDocument
  File "src/lxml/parser.pxi", line 1876, in lxml.etree._parseDoc
  File "src/lxml/parser.pxi", line 1164, in lxml.etree._BaseParser._parseDoc
  File "src/lxml/parser.pxi", line 633, in lxml.etree._ParserContext._handleParseResultDoc
  File "src/lxml/parser.pxi", line 743, in lxml.etree._handleParseResult
  File "src/lxml/parser.pxi", line 672, in lxml.etree._raiseParseError
  File "<string>", line 17
lxml.etree.XMLSyntaxError: Extra content at the end of the document, line 17, column 6165

However, if you switch the flag (strip_x00 = True), it works again:

lxml version: 5.1.0
libxml version: (2, 12, 3)

So my suggestion is to always strip(b'\x00') in bagPy library.

selimnairb commented 3 months ago

Thank you @giumas for the helpful test case. @stephen-patterson-noaa @heathhenley we're working on a bug-fix release and will address this issue. This will likely happen as part of this PR. We hope to have this ready in the next few of weeks. Thanks for your interest and contributions!