Open heathhenley opened 8 months ago
Hi @heathhenley thanks for posting this. This looks like a bug. I'll have a look over the next few working days and get back to you.
@heathhenley I finally had a chance to look at this, sorry for the delay. Can you tell me what version of Python you are using? When I open examples/sample-data/sample-1.5.0.bag
using Python 3.11 on my Mac, it opens fine. I'm away from the office right now, so I don't have my Windows machine to test on, but wanted to see if this was a Python version issue in the meantime. Thanks!
Thanks for looking into this! I'm still not convinced that I didn't just miss something, I don't usually use conda so I'm winging it there. I know at least in experience, windows is always weird with open source gis tools (gdal etc) too.
I must have torched my set up from last time, so I set up to try it again today on windows 11, here's what I did:
conda install conda-forge::bagpy
I can run the tests, but I get 4 fails and an error, I haven't dug into them at all:
================================================================= short test summary info =================================================================
FAILED python/test_compat_gdal.py::TestCompatGDAL::test_gdal_create_simple - SystemError: _PyErr_SetObject: exception <class 'bagPy.MetadataNotFound'> is not a BaseException subclass
FAILED python/test_dataset.py::TestDataset::testGetLayerTypes - bagPy.ErrorLoadingMetadata
FAILED python/test_interleavedlegacylayer.py::TestInterleavedLegacyLayer::testGetLayerAndRead - bagPy.ErrorLoadingMetadata
FAILED python/test_simplelayer.py::TestSimpleLayer::testRead - bagPy.ErrorLoadingMetadata
ERROR python/test_compat_gdal.py::TestCompatGDAL::test_gdal_create_simple - PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\heath\\AppData\\Local\\Temp\...
=================================================== 4 failed, 91 passed, 20 warnings, 1 error in 1.54s ====================================================
But three of them maybe related to the original problem I had. There was a problem in the geodjango /gdal tests on windows related to files not being allowed to be opened more than once on windows (without being closed), mac/linux doesn't care, maybe that's what's going on with the permission error, I didn't dig in.
To actually answer your question I'm using 3.12.2:
(base) c:\dev\BAG>python
Python 3.12.2 | packaged by conda-forge | (main, Feb 16 2024, 20:42:31) [MSC v.1937 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import bagPy
>>> dataset = bagPy.Dataset.openDataset(r"C:\dev\BAG\examples\sample-data\sample-2.0.1.bag", bagPy.BAG_OPEN_READ_WRITE)
>>> dataset = bagPy.Dataset.openDataset(r"C:\dev\BAG\examples\sample-data\sample-1.5.0.bag", bagPy.BAG_OPEN_READ_WRITE)
Entity: line 17: parser error : Extra content at the end of the document
erNote></gmd:MD_SecurityConstraints></gmd:metadataConstraints></gmi:MI_Metadata>
^
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\heath\miniconda3\Lib\site-packages\bagPy\__init__.py", line 9401, in openDataset
return _bagPy.Dataset_openDataset(fileName, openMode)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
bagPy.ErrorLoadingMetadata
Greetings, I was noticing the same issue as this open bug. I am using a Windows laptop and just trying to get familiar with the bagPy library by opening a file and I receive the following error: "Entity: line 33: parser error : Extra content at the end of the document" Some stackoverflow articles mentioned that XML needs a parent tag and it will throw that error if missing one.
Here is the code I was running:
dataset = bagPy.Dataset.openDataset(bag_file_path, bagPy.BAG_OPEN_READ_WRITE)
I tried it with Python versions 3.8, 3.9, and 3.12. Seeing the same error with all of those conda environments.
Greetings, I was noticing the same issue as this open bug. I am using a Windows laptop and just trying to get familiar with the bagPy library by opening a file and I receive the following error: "Entity: line 33: parser error : Extra content at the end of the document" Some stackoverflow articles mentioned that XML needs a parent tag and it will throw that error if missing one.
Here is the code I was running:
dataset = bagPy.Dataset.openDataset(bag_file_path, bagPy.BAG_OPEN_READ_WRITE)
I tried it with Python versions 3.8, 3.9, and 3.12. Seeing the same error with all of those conda environments.
I was able to open a 2022 BAG file, but nothing older than that as far as I know.
@selimnairb, I had a similar issue in Python using lxml to parse the xml in the BAG files. Given that lxml is wrapping libxml2, it is likely the same change in behavior that was introduced with a recent version of the library. The simple solution was to strip the retrieved string in order to remove the trailing characters.
@giumas @stephen-patterson-noaa Can you confirm whether you are seeing this error with any of the sample BAGs in examples/sample-data?
I tried to open each of the sample BAG files in examples/sample-data and view their layers.
Only 2 files(bag_georefmetadata_layer.bag, sample-2.0.1.bag) opened and printed the layer names:
bag_georefmetadata_layer.bag
Elevation
Uncertainty
Elevation
example_w_qc_layers.bag
- bagPy.ErrorLoadingMetadata
metadata_layer_example.bag
- OSError: [Errno 0] Error
nominal_only.bag
- bagPy.ErrorLoadingMetadata
sample-1.5.0.bag
- bagPy.ErrorLoadingMetadata
sample-2.0.1.bag
Elevation
Uncertainty
Nominal_Elevation
Surface_Correction
true_n_nominal.bag
- bagPy.ErrorLoadingMetadata
@selimnairb, this minimal script replicates the issue without using BagPy:
from h5py import File
from lxml import etree, __version__ as lxml_version
bag_path = r"C:\code\cpp\BAG\examples\sample-data\sample-1.5.0.bag"
strip_x00 = False
print("lxml version: %s" % lxml_version)
print("libxml version: %s" % (etree.LIBXML_COMPILED_VERSION, ))
bag = File(bag_path, 'r')
xml = bag["BAG_root/metadata"][:].tobytes()
if strip_x00:
xml = xml.strip(b'\x00')
xml_tree = etree.fromstring(xml)
If I run it using old versions of libxml, it works as is (strip_x00 = False
). Example output:
lxml version: 4.7.1
libxml version: (2, 9, 12)
However, the same script fails when executed in a more recent version of libxml:
lxml version: 5.1.0
libxml version: (2, 12, 3)
Traceback (most recent call last):
File "C:\code\hyo2\hyo2_bag\examples\workground\open_bag_metadata_xml.py", line 15, in <module>
xml_tree = etree.fromstring(xml)
^^^^^^^^^^^^^^^^^^^^^
File "src/lxml/etree.pyx", line 3264, in lxml.etree.fromstring
File "src/lxml/parser.pxi", line 1989, in lxml.etree._parseMemoryDocument
File "src/lxml/parser.pxi", line 1876, in lxml.etree._parseDoc
File "src/lxml/parser.pxi", line 1164, in lxml.etree._BaseParser._parseDoc
File "src/lxml/parser.pxi", line 633, in lxml.etree._ParserContext._handleParseResultDoc
File "src/lxml/parser.pxi", line 743, in lxml.etree._handleParseResult
File "src/lxml/parser.pxi", line 672, in lxml.etree._raiseParseError
File "<string>", line 17
lxml.etree.XMLSyntaxError: Extra content at the end of the document, line 17, column 6165
However, if you switch the flag (strip_x00 = True
), it works again:
lxml version: 5.1.0
libxml version: (2, 12, 3)
So my suggestion is to always strip(b'\x00')
in bagPy library.
Thank you @giumas for the helpful test case. @stephen-patterson-noaa @heathhenley we're working on a bug-fix release and will address this issue. This will likely happen as part of this PR. We hope to have this ready in the next few of weeks. Thanks for your interest and contributions!
I'm looking to read in some surveys and I'm exploring using the your python bindings here to do it. I came across this xml parse error (
Extra content at the end of the document
) trying to read .BAGs from H12025 and W00426.I'm assuming there's some error with my set up or understanding of how to use it, as I see the same behavior with the
sample-1.5.0.bag
in the repo's examples:If anyone can help me understand what I've missed I would appreciate it.
I'm seeing the same error with the real data, but not failing on the same xml tag, for example:
I am able to parse and manipulate both of those BAGs with the tools in https://github.com/hydroffice/hyo2_bag - so that's an option too, but this package just seemed to be more actively maintained and documented.