RDFLib / rdflib

RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.
https://rdflib.readthedocs.org
BSD 3-Clause "New" or "Revised" License
2.15k stars 555 forks source link

FAILED test/test_misc/test_parse_file_guess_format.py::TestFileParserGuessFormat::test_warning #2748

Closed ncopa closed 2 months ago

ncopa commented 6 months ago

The test/test_misc/test_parse_file_guess_format.py::TestFileParserGuessFormat::test_warning fails on Alpine Linux edge x86_64:

Python 3.11.8

=================================== FAILURES ===================================
____________________ TestFileParserGuessFormat.test_warning ____________________

self = <rdflib.plugins.parsers.notation3.SinkParser object at 0x7fbecb88c910>
argstr = '<?xml version="1.0"?>\n\n<!--\n  Copyright World Wide Web Consortium, (Massachusetts Institute of\n  Technology, Inst...rdf:datatype="http://www.w3.org/2001/XMLSchema#integer" xml:lang="fr">10</eg:baz>\n </rdf:Description>\n\n</rdf:RDF>\n'
i = 330, res = []

    def uri_ref2(self, argstr: str, i: int, res: MutableSequence[Any]) -> int:
        """Generate uri from n3 representation.

        Note that the RDF convention of directly concatenating
        NS and local name is now used though I prefer inserting a '#'
        to make the namesapces look more like what XML folks expect.
        """
        qn: typing.List[Any] = []
        j = self.qname(argstr, i, qn)
        if j >= 0:
            pfx, ln = qn[0]
            if pfx is None:
                assert 0, "not used?"
                ns = self._baseURI + ADDED_HASH  # type: ignore[unreachable]
            else:
                try:
>                   ns = self._bindings[pfx]
E                   KeyError: 'Description'

rdflib/plugins/parsers/notation3.py:1232: KeyError

During handling of the above exception, another exception occurred:

self = <Graph identifier=Nb4b72901e98b4f9f86eddbb8ac3005d9 (<class 'rdflib.graph.Graph'>)>
source = <_io.BufferedReader name='/tmp/tmpmcthgvqs/no_file_ext'>
publicID = None, format = 'turtle', location = None, file = None, data = None
args = {}, could_not_guess_format = True
parser = <rdflib.plugins.parsers.notation3.TurtleParser object at 0x7fbec6ac2510>

    def parse(
        self,
        source: Optional[
            Union[IO[bytes], TextIO, InputSource, str, bytes, pathlib.PurePath]
        ] = None,
        publicID: Optional[str] = None,  # noqa: N803
        format: Optional[str] = None,
        location: Optional[str] = None,
        file: Optional[Union[BinaryIO, TextIO]] = None,
        data: Optional[Union[str, bytes]] = None,
        **args: Any,
    ) -> "Graph":
        """
        Parse an RDF source adding the resulting triples to the Graph.

        The source is specified using one of source, location, file or data.

        .. caution::

           This method can access directly or indirectly requested network or
           file resources, for example, when parsing JSON-LD documents with
           ``@context`` directives that point to a network location.

           When processing untrusted or potentially malicious documents,
           measures should be taken to restrict network and file access.

           For information on available security measures, see the RDFLib
           :doc:`Security Considerations </security_considerations>`
           documentation.

        :param source: An `InputSource`, file-like object, `Path` like object,
            or string. In the case of a string the string is the location of the
            source.
        :param location: A string indicating the relative or absolute URL of the
            source. `Graph`'s absolutize method is used if a relative location
            is specified.
        :param file: A file-like object.
        :param data: A string containing the data to be parsed.
        :param format: Used if format can not be determined from source, e.g.
            file extension or Media Type. Defaults to text/turtle. Format
            support can be extended with plugins, but "xml", "n3" (use for
            turtle), "nt" & "trix" are built in.
        :param publicID: the logical URI to use as the document base. If None
            specified the document location is used (at least in the case where
            there is a document location). This is used as the base URI when
            resolving relative URIs in the source document, as defined in `IETF
            RFC 3986
            <https://datatracker.ietf.org/doc/html/rfc3986#section-5.1.4>`_,
            given the source document does not define a base URI.
        :return: ``self``, i.e. the :class:`~rdflib.graph.Graph` instance.

        Examples:

        >>> my_data = '''
        ... <rdf:RDF
        ...   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
        ...   xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
        ... >
        ...   <rdf:Description>
        ...     <rdfs:label>Example</rdfs:label>
        ...     <rdfs:comment>This is really just an example.</rdfs:comment>
        ...   </rdf:Description>
        ... </rdf:RDF>
        ... '''
        >>> import os, tempfile
        >>> fd, file_name = tempfile.mkstemp()
        >>> f = os.fdopen(fd, "w")
        >>> dummy = f.write(my_data)  # Returns num bytes written
        >>> f.close()

        >>> g = Graph()
        >>> result = g.parse(data=my_data, format="application/rdf+xml")
        >>> len(g)
        2

        >>> g = Graph()
        >>> result = g.parse(location=file_name, format="application/rdf+xml")
        >>> len(g)
        2

        >>> g = Graph()
        >>> with open(file_name, "r") as f:
        ...     result = g.parse(f, format="application/rdf+xml")
        >>> len(g)
        2

        >>> os.remove(file_name)

        >>> # default turtle parsing
        >>> result = g.parse(data="<http://example.com/a> <http://example.com/a> <http://example.com/a> .")
        >>> len(g)
        3

        """

        source = create_input_source(
            source=source,
            publicID=publicID,
            location=location,
            file=file,
            data=data,
            format=format,
        )
        if format is None:
            format = source.content_type
        could_not_guess_format = False
        if format is None:
            if (
                hasattr(source, "file")
                and getattr(source.file, "name", None)
                and isinstance(source.file.name, str)
            ):
                format = rdflib.util.guess_format(source.file.name)
            if format is None:
                format = "turtle"
                could_not_guess_format = True
        parser = plugin.get(format, Parser)()
        try:
            # TODO FIXME: Parser.parse should have **kwargs argument.
>           parser.parse(source, self, **args)

rdflib/graph.py:1492: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
rdflib/plugins/parsers/notation3.py:2021: in parse
    p.loadStream(stream)
rdflib/plugins/parsers/notation3.py:479: in loadStream
    return self.loadBuf(stream.read())  # Not ideal
rdflib/plugins/parsers/notation3.py:485: in loadBuf
    self.feed(buf)
rdflib/plugins/parsers/notation3.py:511: in feed
    i = self.directiveOrStatement(s, j)
rdflib/plugins/parsers/notation3.py:530: in directiveOrStatement
    j = self.statement(argstr, i)
rdflib/plugins/parsers/notation3.py:778: in statement
    j = self.property_list(argstr, i, r[0])
rdflib/plugins/parsers/notation3.py:1140: in property_list
    i = self.objectList(argstr, j, objs)
rdflib/plugins/parsers/notation3.py:1190: in objectList
    i = self.object(argstr, i, res)
rdflib/plugins/parsers/notation3.py:1487: in object
    j = self.subject(argstr, i, res)
rdflib/plugins/parsers/notation3.py:785: in subject
    return self.item(argstr, i, res)
rdflib/plugins/parsers/notation3.py:877: in item
    return self.path(argstr, i, res)
rdflib/plugins/parsers/notation3.py:884: in path
    j = self.nodeOrLiteral(argstr, i, res)
rdflib/plugins/parsers/notation3.py:1515: in nodeOrLiteral
    j = self.node(argstr, i, res)
rdflib/plugins/parsers/notation3.py:1102: in node
    j = self.uri_ref2(argstr, i, res)
rdflib/plugins/parsers/notation3.py:1240: in uri_ref2
    self.BadSyntax(argstr, i, 'Prefix "%s:" not bound' % (pfx))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <rdflib.plugins.parsers.notation3.SinkParser object at 0x7fbecb88c910>
argstr = '<?xml version="1.0"?>\n\n<!--\n  Copyright World Wide Web Consortium, (Massachusetts Institute of\n  Technology, Inst...rdf:datatype="http://www.w3.org/2001/XMLSchema#integer" xml:lang="fr">10</eg:baz>\n </rdf:Description>\n\n</rdf:RDF>\n'
i = 330, msg = 'Prefix "Description:" not bound'

    def BadSyntax(self, argstr: str, i: int, msg: str) -> NoReturn:
>       raise BadSyntax(self._thisDoc, self.lines, argstr, i, msg)
E       rdflib.plugins.parsers.notation3.BadSyntax: <no detail available>

rdflib/plugins/parsers/notation3.py:1730: BadSyntax

During handling of the above exception, another exception occurred:

self = <test.test_misc.test_parse_file_guess_format.TestFileParserGuessFormat object at 0x7fbecac09790>

    def test_warning(self) -> None:
        g = Graph()
        graph_logger = logging.getLogger("rdflib")

        with TemporaryDirectory() as tmpdirname:
            newpath = Path(tmpdirname).joinpath("no_file_ext")
            copyfile(
                os.path.join(
                    TEST_DATA_DIR,
                    "suites",
                    "w3c",
                    "rdf-xml",
                    "datatypes",
                    "test001.rdf",
                ),
                str(newpath),
            )
            with pytest.raises(ParserError, match=r"Could not guess RDF format"):
                with pytest.warns(
                    UserWarning,
                    match="does not look like a valid URI, trying to serialize this will break.",
                ) as logwarning:
>                   g.parse(str(newpath))

test/test_misc/test_parse_file_guess_format.py:86: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <Graph identifier=Nb4b72901e98b4f9f86eddbb8ac3005d9 (<class 'rdflib.graph.Graph'>)>
source = <_io.BufferedReader name='/tmp/tmpmcthgvqs/no_file_ext'>
publicID = None, format = 'turtle', location = None, file = None, data = None
args = {}, could_not_guess_format = True
parser = <rdflib.plugins.parsers.notation3.TurtleParser object at 0x7fbec6ac2510>

    def parse(
        self,
        source: Optional[
            Union[IO[bytes], TextIO, InputSource, str, bytes, pathlib.PurePath]
        ] = None,
        publicID: Optional[str] = None,  # noqa: N803
        format: Optional[str] = None,
        location: Optional[str] = None,
        file: Optional[Union[BinaryIO, TextIO]] = None,
        data: Optional[Union[str, bytes]] = None,
        **args: Any,
    ) -> "Graph":
        """
        Parse an RDF source adding the resulting triples to the Graph.

        The source is specified using one of source, location, file or data.

        .. caution::

           This method can access directly or indirectly requested network or
           file resources, for example, when parsing JSON-LD documents with
           ``@context`` directives that point to a network location.

           When processing untrusted or potentially malicious documents,
           measures should be taken to restrict network and file access.

           For information on available security measures, see the RDFLib
           :doc:`Security Considerations </security_considerations>`
           documentation.

        :param source: An `InputSource`, file-like object, `Path` like object,
            or string. In the case of a string the string is the location of the
            source.
        :param location: A string indicating the relative or absolute URL of the
            source. `Graph`'s absolutize method is used if a relative location
            is specified.
        :param file: A file-like object.
        :param data: A string containing the data to be parsed.
        :param format: Used if format can not be determined from source, e.g.
            file extension or Media Type. Defaults to text/turtle. Format
            support can be extended with plugins, but "xml", "n3" (use for
            turtle), "nt" & "trix" are built in.
        :param publicID: the logical URI to use as the document base. If None
            specified the document location is used (at least in the case where
            there is a document location). This is used as the base URI when
            resolving relative URIs in the source document, as defined in `IETF
            RFC 3986
            <https://datatracker.ietf.org/doc/html/rfc3986#section-5.1.4>`_,
            given the source document does not define a base URI.
        :return: ``self``, i.e. the :class:`~rdflib.graph.Graph` instance.

        Examples:

        >>> my_data = '''
        ... <rdf:RDF
        ...   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
        ...   xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
        ... >
        ...   <rdf:Description>
        ...     <rdfs:label>Example</rdfs:label>
        ...     <rdfs:comment>This is really just an example.</rdfs:comment>
        ...   </rdf:Description>
        ... </rdf:RDF>
        ... '''
        >>> import os, tempfile
        >>> fd, file_name = tempfile.mkstemp()
        >>> f = os.fdopen(fd, "w")
        >>> dummy = f.write(my_data)  # Returns num bytes written
        >>> f.close()

        >>> g = Graph()
        >>> result = g.parse(data=my_data, format="application/rdf+xml")
        >>> len(g)
        2

        >>> g = Graph()
        >>> result = g.parse(location=file_name, format="application/rdf+xml")
        >>> len(g)
        2

        >>> g = Graph()
        >>> with open(file_name, "r") as f:
        ...     result = g.parse(f, format="application/rdf+xml")
        >>> len(g)
        2

        >>> os.remove(file_name)

        >>> # default turtle parsing
        >>> result = g.parse(data="<http://example.com/a> <http://example.com/a> <http://example.com/a> .")
        >>> len(g)
        3

        """

        source = create_input_source(
            source=source,
            publicID=publicID,
            location=location,
            file=file,
            data=data,
            format=format,
        )
        if format is None:
            format = source.content_type
        could_not_guess_format = False
        if format is None:
            if (
                hasattr(source, "file")
                and getattr(source.file, "name", None)
                and isinstance(source.file.name, str)
            ):
                format = rdflib.util.guess_format(source.file.name)
            if format is None:
                format = "turtle"
                could_not_guess_format = True
        parser = plugin.get(format, Parser)()
        try:
            # TODO FIXME: Parser.parse should have **kwargs argument.
            parser.parse(source, self, **args)
        except SyntaxError as se:
            if could_not_guess_format:
>               raise ParserError(
                    "Could not guess RDF format for %r from file extension so tried Turtle but failed."
                    "You can explicitly specify format using the format argument."
                    % source
                )
E               rdflib.exceptions.ParserError: Could not guess RDF format for <_io.BufferedReader name='/tmp/tmpmcthgvqs/no_file_ext'> from file extension so tried Turtle but failed.You can explicitly specify format using the format argument.

rdflib/graph.py:1495: ParserError

During handling of the above exception, another exception occurred:

self = <test.test_misc.test_parse_file_guess_format.TestFileParserGuessFormat object at 0x7fbecac09790>

    def test_warning(self) -> None:
        g = Graph()
        graph_logger = logging.getLogger("rdflib")

        with TemporaryDirectory() as tmpdirname:
            newpath = Path(tmpdirname).joinpath("no_file_ext")
            copyfile(
                os.path.join(
                    TEST_DATA_DIR,
                    "suites",
                    "w3c",
                    "rdf-xml",
                    "datatypes",
                    "test001.rdf",
                ),
                str(newpath),
            )
            with pytest.raises(ParserError, match=r"Could not guess RDF format"):
>               with pytest.warns(
                    UserWarning,
                    match="does not look like a valid URI, trying to serialize this will break.",
                ) as logwarning:
E               Failed: DID NOT WARN. No warnings of type (<class 'UserWarning'>,) were emitted.
E                Emitted warnings: [].

test/test_misc/test_parse_file_guess_format.py:82: Failed
------------------------------ Captured log call -------------------------------
2024-03-20T11:05:23.295 WARNING  rdflib.term  term.py:287:__new__ file:///tmp/tmpmcthgvqs/?xml version="1.0"? does not look like a valid URI, trying to serialize this will break.
2024-03-20T11:05:23.295 WARNING  rdflib.term  term.py:287:__new__ !--
  Copyright World Wide Web Consortium, (Massachusetts Institute of
  Technology, Institut National de Recherche en Informatique et en
  Automatique, Keio University).

  All Rights Reserved.

  Please see the full Copyright clause at
  <http://www.w3.org/Consortium/Legal/copyright-software.html does not look like a valid URI, trying to serialize this will break.
=============================== warnings summary ===============================
test/test_literal/test_literal.py::test_ill_typed_literals[yes-http://www.w3.org/2001/XMLSchema#boolean-True]
  /home/ncopa/aports/community/py3-rdflib/src/rdflib-7.0.0/rdflib/term.py:1719: UserWarning: Parsing weird boolean, 'yes' does not map to True or False
    warnings.warn(

test/test_namespace/test_definednamespace.py::test_inspect[DFNSDefaults]
  /usr/lib/python3.11/inspect.py:2486: UserWarning: Code: _partialmethod is not defined in namespace DFNSDefaults
    partialmethod = obj._partialmethod

test/test_namespace/test_definednamespace.py::test_inspect[DFNSWarnNoFail]
  /usr/lib/python3.11/inspect.py:2486: UserWarning: Code: _partialmethod is not defined in namespace DFNSWarnNoFail
    partialmethod = obj._partialmethod

test/test_namespace/test_definednamespace.py::test_inspect[DFNSDefaultsEmpty]
  /usr/lib/python3.11/inspect.py:2486: UserWarning: Code: _partialmethod is not defined in namespace DFNSDefaultsEmpty
    partialmethod = obj._partialmethod

test/test_namespace/test_namespace.py::TestNamespacePrefix::test_closed_namespace
  /home/ncopa/aports/community/py3-rdflib/src/rdflib-7.0.0/test/test_namespace/test_namespace.py:228: UserWarning: DefinedNamespace does not address deprecated properties
    warn("DefinedNamespace does not address deprecated properties")

test/test_parsers/test_n3parse_of_rdf_lists.py::TestOWLCollectionTest::test_collection_rdfxml
  /home/ncopa/aports/community/py3-rdflib/src/rdflib-7.0.0/rdflib/plugins/serializers/rdfxml.py:280: UserWarning: Assertions on rdflib.term.BNode('N9c925dd1ada149b2a379055264dc7ed7') other than RDF.first and RDF.rest are ignored ... including RDF.List
    self.predicate(predicate, object, depth + 1)

test/test_roundtrip.py: 12 warnings
  /home/ncopa/aports/community/py3-rdflib/src/rdflib-7.0.0/rdflib/term.py:1585: UserWarning: Serializing weird numerical rdflib.term.Literal('xy.z', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#double'))
    warnings.warn("Serializing weird numerical %r" % self)

test/test_roundtrip.py: 12 warnings
  /home/ncopa/aports/community/py3-rdflib/src/rdflib-7.0.0/rdflib/term.py:1585: UserWarning: Serializing weird numerical rdflib.term.Literal('+1.0z', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#double'))
    warnings.warn("Serializing weird numerical %r" % self)

test/test_roundtrip.py: 12 warnings
  /home/ncopa/aports/community/py3-rdflib/src/rdflib-7.0.0/rdflib/term.py:1585: UserWarning: Serializing weird numerical rdflib.term.Literal('ab.c', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#double'))
    warnings.warn("Serializing weird numerical %r" % self)

test/test_serializers/test_serializer.py: 10 warnings
test/test_tools/test_chunk_serializer.py: 4 warnings
  /home/ncopa/aports/community/py3-rdflib/src/rdflib-7.0.0/rdflib/plugins/serializers/nt.py:40: UserWarning: NTSerializer always uses UTF-8 encoding. Given encoding was: None
    warnings.warn(

test/test_util.py::TestUtilTermConvert::test_util_from_n3_expectliteralandlangdtype
  /usr/lib/python3.11/site-packages/_pytest/python.py:194: UserWarning: Code: fr is not defined in namespace XSD
    result = testfunction(**testargs)

test/test_util.py::TestUtilTermConvert::test_util_from_n3_not_escapes[\\I]
  /home/ncopa/aports/community/py3-rdflib/src/rdflib-7.0.0/rdflib/util.py:213: DeprecationWarning: invalid escape sequence '\I'
    value = value.encode("raw-unicode-escape").decode("unicode-escape")

test/test_w3c_spec/test_sparql10_w3c.py: 20 warnings
test/test_w3c_spec/test_sparql11_w3c.py: 50 warnings
  /home/ncopa/aports/community/py3-rdflib/src/rdflib-7.0.0/rdflib/term.py:1161: DeprecationWarning: NotImplemented should not be used in a boolean context
    return not self.__gt__(other) and not self.eq(other)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED test/test_misc/test_parse_file_guess_format.py::TestFileParserGuessFormat::test_warning - Failed: DID NOT WARN. No warnings of type (<class 'UserWarning'>,) were emi...
= 1 failed, 7277 passed, 59 skipped, 370 xfailed, 128 warnings in 107.85s (0:01:47) =
edmondchuc commented 6 months ago

Can you check what pytest version you are running? I think this may be related to https://github.com/RDFLib/rdflib/pull/2727 if you are on pytest 8.

ncopa commented 6 months ago

This is the latest release, 7.0.0.

Yes I believe it is it is definitively related to #2727 and pytest 8