georchestra / sdi-consistence-check

A project to check links between data and metadata in a SDI
GNU General Public License v3.0
4 stars 3 forks source link

Type 'xml.etree.ElementTree.Element' cannot be serialized #82

Closed bchartier closed 4 months ago

bchartier commented 5 months ago

When running SDI-CC on DatagrandEst WMS server a lot of errors of the following kind are raised:

  Layer: geograndest:GGE_ORTHO_RVB_2018_2019
  Error: Metadata https://www.datagrandest.fr/geonetwork/srv/api/records/FR-200052264-A0127-0000/formatters/xml 
not found or invalid for layer 'geograndest:GGE_ORTHO_RVB_2018_2019': 
Unable to parse the text/xml metadata: Type 'xml.etree.ElementTree.Element' cannot be serialized.

In order to locate where these errors occur in the code I needed to modify SDI-CC code.

The traceback I get:

Traceback (most recent call last):
  File "[...]\temp\test-md.py", line 20, in <module>
    md = MD_Metadata(etree.fromstring(content))
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "[...]\.venv\Lib\site-packages\owslib\iso.py", line 63, in __init__
    self.xml = etree.tostring(md)
               ^^^^^^^^^^^^^^^^^^
  File "src\\lxml\\etree.pyx", line 3514, in lxml.etree.tostring
TypeError: Type 'xml.etree.ElementTree.Element' cannot be serialized.

This seems to occur there: https://github.com/georchestra/sdi-consistence-check/blob/master/sdi-consistence-check/geometadata.py#L21

A small piece of code to reproduce this error:

import traceback
import xml.etree.ElementTree as etree

from requests import HTTPError
from owslib.iso import MD_Metadata
from owslib.util import openURL

md_urls = (
    r"https://www.geo2france.fr/geonetwork/srv/api/records/94a69703-572f-463a-9cfc-6bca075384b8/formatters/xml",
    r"https://www.datagrandest.fr/geonetwork/srv/api/records/FR-200052264-A0128-0000/formatters/xml",
)

for md_url in md_urls:
    try:
        print(f"processing {md_url}...")
        raw_md = openURL(md_url)
        content = raw_md.read()

        md = MD_Metadata(etree.fromstring(content))
        print(md)
    except Exception:
        print(traceback.format_exc())

I think the metadata files are correct. A small piece of code to demonstrate this:

import traceback
import xml.etree.ElementTree as etree

from requests import HTTPError
import requests
from owslib.iso import MD_Metadata

md_urls = (
    r"https://www.geo2france.fr/geonetwork/srv/api/records/94a69703-572f-463a-9cfc-6bca075384b8/formatters/xml",
    r"https://www.datagrandest.fr/geonetwork/srv/api/records/FR-200052264-A0128-0000/formatters/xml",
)

for md_url in md_urls:
    try:
        print(f"processing {md_url}...")

        resp = requests.get(md_url)
        content = resp.content

        with open("md.xml", "wb") as f:
            f.write(content)
        md_tree = etree.parse('md.xml')
        print(etree.tostring(md_tree.getroot()))

    except Exception:
        print(traceback.format_exc())

I'm wondering if the error comes from OWSlib.

gryckelynck commented 4 months ago

@landryb : il me semble que le problème que tu soulignes au début de ce message https://github.com/georchestra/sdi-consistence-check/discussions/80#discussioncomment-9985900 est similaire à ce ticket. A confirmer et voir avec @bchartier les pistes de résolution envisagées/envisageables.

landryb commented 4 months ago

oui ca semble être la même erreur, merci! c'est probablement une question de version python sur les types de str..

landryb commented 4 months ago

using python 3.11.2 & owslib 0.27.2 from debian 12, the following diff fixes it for me, although i have no idea if it's correct:

diff --git a/sdi-consistence-check/geometadata.py b/sdi-consistence-check/geometadata.py
index b2bcd8c..e4907d7 100644
--- a/sdi-consistence-check/geometadata.py
+++ b/sdi-consistence-check/geometadata.py
@@ -1,5 +1,4 @@
-import xml.etree.ElementTree as etree
-
+from owslib.etree import etree
 from owslib.iso import MD_Metadata
 from owslib.util import openURL
 from requests import HTTPError

it forces the use of the owslib version of etree, and python3 sdi-consistence-check/checker.py --mode WMS --server https://ids.dev.craig.fr/wxs/wms finds a lot of OK metadata links.

according to https://owslib.readthedocs.io/en/latest/usage.html#iso that's how it should be imported..

bchartier commented 4 months ago

using python 3.11.2 & owslib 0.27.2 from debian 12, the following diff fixes it for me, although i have no idea if it's correct:

I'll try to test this as soon as possible. Thanks a lot @landryb

bchartier commented 4 months ago

@landryb: the change you suggested seems to fix this issue. Thanks a lot.