Closed lmmx closed 1 year ago
@lmmx Hi.
Your xml document defines the default namespace, so all sub-elements (including loc
and lastmod
) belongs to that namespace. When you defined a sub-model nsmap
parameter is not inherited from the parent one.
So your model
class Url(BaseXmlModel, nsmap={}):
loc: Loc
lastmod: LastMod
doesn't inherit default namespace from UrlSet
which means it tries to find loc
and lastmod
without namespace.
Try to redefine Url
model like this:
class Url(BaseXmlModel, nsmap=NSMAP):
loc: Loc
lastmod: LastMod
Ahh thank you v much. That edit fixed all but 1 of my demo test cases.
from typing import Optional
from pydantic import ValidationError
from pydantic_xml import BaseXmlModel, RootXmlModel, attr, element
NSMAP = {"": "http://www.sitemaps.org/schemas/sitemap/0.9"}
class Loc(RootXmlModel, tag="loc"):
root: str
class LastMod(RootXmlModel, tag="lastmod"):
root: str
class Url(BaseXmlModel, tag="url", nsmap=NSMAP):
loc: Loc
lastmod: LastMod
class UrlSet(BaseXmlModel, tag="urlset", nsmap=NSMAP):
urls: list[Url] = element(default=[])
urlfree = b'<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"/>'
assert urlfree == UrlSet(urls=[]).to_xml()
assert urlfree == UrlSet.from_xml(urlfree).to_xml()
ns = b"http://www.sitemaps.org/schemas/sitemap/0.9"
ux = b"<url><loc>https://pyfound.blogspot.com/2023/08/announcing-our-new-pypi-safety-security.html</loc><lastmod>2023-08-04T16:32:28Z</lastmod></url>"
usx = b'<urlset xmlns="' + ns + b'">' + ux + b"</urlset>"
stub = b"""<?xml version="1.0" encoding="UTF-8"?>""" + usx
u_base = "https://pyfound.blogspot.com"
u_loc = f"{u_base}/2023/08/announcing-our-new-pypi-safety-security.html"
u_mod = "2023-08-04T16:32:28Z"
gen_u = Url(loc=u_loc, lastmod=u_mod)
gen_ux = gen_u.to_xml()
# assert gen_ux == ux, "Url generation is not accurate"
regen_ux = Url.from_xml(gen_ux)
assert regen_ux == gen_u, "Url generation is not symmetric"
gen_us = UrlSet(urls=[gen_u])
gen_usx = gen_us.to_xml()
assert gen_usx == usx, "UrlSet generation is not accurate"
regen_usx = UrlSet.from_xml(gen_usx)
assert regen_usx == gen_us, "UrlSet generation is not symmetric"
assert stub[38:] == gen_usx
The one it didn't solve feels a bit counterintuitive: I guess you can't generate the <url>
tag on its own, without the namespace attribute [the one I left commented out above]:
# assert gen_ux == ux, "Url generation is not accurate"
I guess it makes sense in the context. Thanks again for the tip
I tried to parse a XML document with a schema URL and got a failure, which I think only occurs when there's a namespace set on the XML model.
I read the tests and followed the exact layout used there for nested submodels in case it was my mistake (since this is my first time using this library), but it seems to only occur when a namespace schema URL is in the top model.
It doesn't seem to be due to this namespace map being inherited, because setting an empty
nsmap
on the first submodel (to 'undo' the inherited namespace) doesn't change the error either.The following test case demonstrates the bug, adapted from this test
Result
(Abbreviated for clarity)
Click to show entire pytest output
Result: ```sh (magicscrape) louis 🚶 ~/dev/testing/pydantic_xml $ pytest submodels_ns.py ``` ```py [615/1944] ========================================================================== test session starts ========================================================================== platform linux -- Python 3.10.12, pytest-7.4.0, pluggy-1.2.0 rootdir: /home/louis/dev/testing/pydantic_xml plugins: anyio-3.7.1 collected 2 items submodels_ns.py .F [100%] =============================================================================== FAILURES ================================================================================ ____________________________________ test_nested_root_submodel_element_extraction[True-http://www.sitemaps.org/schemas/sitemap/0.9] _____________________________________ use_ns = True, schema_url = 'http://www.sitemaps.org/schemas/sitemap/0.9' @mark.parametrize("schema_url", ["http://www.sitemaps.org/schemas/sitemap/0.9"]) @mark.parametrize("use_ns", [False, True]) def test_nested_root_submodel_element_extraction(use_ns, schema_url): if use_ns: NSMAP = {"": schema_url} else: NSMAP = {} class Loc(RootXmlModel, tag="loc"): root: int class LastMod(RootXmlModel, tag="lastmod"): root: int class Url(BaseXmlModel, nsmap={}): loc: Loc lastmod: LastMod class UrlSet( BaseXmlModel, tag="urlset", nsmap=NSMAP, ): url: list[Url] = element() ns = f' xmlns="{schema_url}"' if use_ns else "" xml = f"""I also verified that it isn't a regression due to the move to Pydantic 2, the same result occurs with v1 syntax:
Click to show Pydantic v1 version
```py from pytest import mark from pydantic_xml import BaseXmlModel, element @mark.parametrize("schema_url", ["http://www.sitemaps.org/schemas/sitemap/0.9"]) @mark.parametrize("use_ns", [False, True]) def test_nested_root_submodel_element_extraction(use_ns, schema_url): if use_ns: NSMAP = {"": schema_url} else: NSMAP = {} class Loc(BaseXmlModel, tag="loc"): __root__: int class LastMod(BaseXmlModel, tag="lastmod"): __root__: int class Url(BaseXmlModel, nsmap={}): loc: Loc lastmod: LastMod class UrlSet( BaseXmlModel, tag="urlset", nsmap=NSMAP, ): url: list[Url] = element() ns = f' xmlns="{schema_url}"' if use_ns else "" xml = f"""From the traceback I breakpointed the
model.py
andserializers/factories/model.py
modules and inspected withbreakpoint()
andtraceback.print_stack()
which showed that the validation error is occurring because theUrl
validator (i.e. the submodel) is first being passed a correctsource
of the<url>...</url>
substring, which becomes a validresult
dict and passes themodel_validate
call, but then on a 2nd pass receives an incorrectsource
equal to the entire XML string, which produces an emptyresult
dict which fails the 2ndmodel_validate
call.I.e. the
ValidationError
is arising from trying to interpret the entire XML string as the submodel, which is obviously going to fail.I don't know if this internal view really helps debug it or not but it might point to the root cause, and rule out a failure to parse the submodel, but a failure to parse only that submodel.
Click to show the real XML that was simplified for this example
Initial demo script: ```py from typing import Optional from pydantic import ValidationError from pydantic_xml import BaseXmlModel, RootXmlModel, attr, element class Loc(RootXmlModel, tag="loc"): root: str class LastMod(RootXmlModel, tag="lastmod"): root: str class Url(BaseXmlModel): loc: Loc lastmod: LastMod class UrlSet( BaseXmlModel, tag="urlset", nsmap={"": "http://www.sitemaps.org/schemas/sitemap/0.9"}, ): url: list[Url] = element() urlfree = b'