cc-archive / cc-link-checker

Automated link checker for legalcode and license URLs
MIT License
9 stars 13 forks source link

Output of get_local_index_rdf differs between Python 3.7 and Python 3.9 #121

Closed TimidRobot closed 3 years ago

TimidRobot commented 3 years ago

Description

Output of get_local_index_rdf differs between Python 3.7 and Python 3.9.

This makes local development more difficult as Python 3.9 is currently standard.

Reproduction

on my mac using homebrew:

Python 3.7

  1. pipenv --rm
  2. pipenv --clear install --dev --python /usr/local/opt/python@3.7/libexec/bin/python
  3. pipenv run pytest -vv

Python 3.9

  1. pipenv --rm
  2. pipenv --clear install --dev
  3. pipenv run pytest -vv

master branch fails:

================================================================================ test session starts ================================================================================
platform darwin -- Python 3.9.1, pytest-6.1.2, py-1.9.0, pluggy-0.13.1 -- [...]/.local/share/virtualenvs/cc-link-checker-DX1uFXuP/bin/python
cachedir: .pytest_cache
rootdir: [...]/cc-link-checker
collected 29 items                                                                                                                                                                  

link_checker/tests/test_link_checker.py::test_parser_shared PASSED                                                                                                            [  3%]
link_checker/tests/test_link_checker.py::test_parser_shared_licenses PASSED                                                                                                   [  6%]
link_checker/tests/test_link_checker.py::test_parser_shared_rdf PASSED                                                                                                        [ 10%]
link_checker/tests/test_link_checker.py::test_parser_shared_reporting PASSED                                                                                                  [ 13%]
link_checker/tests/test_utils.py::test_get_github_legalcode PASSED                                                                                                            [ 17%]
link_checker/tests/test_utils.py::test_create_base_link[by-nc-nd_2.0] PASSED                                                                                                  [ 20%]
link_checker/tests/test_utils.py::test_create_base_link[by-nc-nd_4.0_cs] PASSED                                                                                               [ 24%]
link_checker/tests/test_utils.py::test_create_base_link[by-nc-nd_3.0_rs_sr-Latn] PASSED                                                                                       [ 27%]
link_checker/tests/test_utils.py::test_create_base_link[samplingplus_1.0] PASSED                                                                                              [ 31%]
link_checker/tests/test_utils.py::test_create_base_link[samplingplus_1.0_br] PASSED                                                                                           [ 34%]
link_checker/tests/test_utils.py::test_create_base_link[zero_1.0] PASSED                                                                                                      [ 37%]
link_checker/tests/test_utils.py::test_output_write PASSED                                                                                                                    [ 41%]
link_checker/tests/test_utils.py::test_output_issues_summary PASSED                                                                                                           [ 44%]
link_checker/tests/test_utils.py::test_create_absolute_link[./license-https://www.demourl.com/dir1/license] PASSED                                                            [ 48%]
link_checker/tests/test_utils.py::test_create_absolute_link[../-https://www.demourl.com/] PASSED                                                                              [ 51%]
link_checker/tests/test_utils.py::test_create_absolute_link[/index-https://www.demourl.com/index] PASSED                                                                      [ 55%]
link_checker/tests/test_utils.py::test_create_absolute_link[//demo.url-https://demo.url] PASSED                                                                               [ 58%]
link_checker/tests/test_utils.py::test_create_absolute_link[https://creativecommons.org-https://creativecommons.org] PASSED                                                   [ 62%]
link_checker/tests/test_utils.py::test_get_scrapable_links FAILED                                                                                                             [ 65%]
link_checker/tests/test_utils.py::test_exception_handler PASSED                                                                                                               [ 68%]
link_checker/tests/test_utils.py::test_map_links_file PASSED                                                                                                                  [ 72%]
link_checker/tests/test_utils.py::test_write_response PASSED                                                                                                                  [ 75%]
link_checker/tests/test_utils.py::test_get_memoized_result PASSED                                                                                                             [ 79%]
link_checker/tests/test_utils.py::test_memoize_result PASSED                                                                                                                  [ 82%]
link_checker/tests/test_utils.py::test_request_text[https://www.google.com:82-Timeout] PASSED                                                                                 [ 86%]
link_checker/tests/test_utils.py::test_request_text[http://doesnotexist.google.com-ConnectionError] PASSED                                                                    [ 89%]
link_checker/tests/test_utils.py::test_request_local_text PASSED                                                                                                              [ 93%]
link_checker/tests/test_utils.py::test_output_test_summary[3-map_links0] PASSED                                                                                               [ 96%]
link_checker/tests/test_utils.py::test_output_test_summary[0-map_links1] PASSED                                                                                               [100%]

===================================================================================== FAILURES ======================================================================================
_____________________________________________________________________________ test_get_scrapable_links ______________________________________________________________________________

    def test_get_scrapable_links():
        args = link_checker.parse_arguments(["deeds"])
        test_file = (
            "<a name='hello'>without href</a>,"
            " <a href='#hello'>internal link</a>,"
            " <a href='mailto:abc@gmail.com'>mailto protocol</a>,"
            " <a href='https://creativecommons.ca'>Absolute link</a>,"
            " <a href='/index'>Relative Link</a>"
        )
        soup = BeautifulSoup(test_file, "lxml")
        test_case = soup.find_all("a")
        base_url = "https://www.demourl.com/dir1/dir2"
        valid_anchors, valid_links, _ = get_scrapable_links(
            args, base_url, test_case, None, False
        )
        assert str(valid_anchors) == (
            '[<a href="https://creativecommons.ca">Absolute link</a>,'
            ' <a href="/index">Relative Link</a>]'
        )
        assert (
            str(valid_links)
            == "['https://creativecommons.ca', 'https://www.demourl.com/index']"
        )
        # Testing RDF
        args = link_checker.parse_arguments(["index", "--local-index"])
        rdf_obj_list = get_index_rdf(
            args, local_path=constants.TEST_RDF_LOCAL_PATH
        )
        rdf_obj = rdf_obj_list[0]
        base_url = rdf_obj["rdf:about"]
        links_found = get_links_from_rdf(rdf_obj)
        valid_anchors, valid_links, _ = get_scrapable_links(
            args, base_url, links_found, None, False, rdf=True,
        )
        expected_anchors = (
            "[<cc:permits "
            'rdf:resource="http://creativecommons.org/ns#DerivativeWorks"/>, '
            "<cc:permits "
            'rdf:resource="http://creativecommons.org/ns#Reproduction"/>, '
            "<cc:permits "
            'rdf:resource="http://creativecommons.org/ns#Distribution"/>, '
            "<cc:jurisdiction "
            'rdf:resource="http://creativecommons.org/international/ch/"/>, '
            "<foaf:logo "
            'rdf:resource="https://i.creativecommons.org/'
            'l/by-nc-sa/2.5/ch/88x31.png"/>, '
            "<foaf:logo "
            'rdf:resource="https://i.creativecommons.org/'
            'l/by-nc-sa/2.5/ch/80x15.png"/>, '
            "<cc:legalcode "
            'rdf:resource="http://creativecommons.org/'
            'licenses/by-nc-sa/2.5/ch/legalcode.de"/>, '
            "<dc:source "
            'rdf:resource="http://creativecommons.org/licenses/by-nc-sa/2.5/"/>, '
            "<dc:creator "
            'rdf:resource="http://creativecommons.org"/>, '
            "<cc:prohibits "
            'rdf:resource="http://creativecommons.org/ns#CommercialUse"/>, '
            "<cc:licenseClass "
            'rdf:resource="http://creativecommons.org/license/"/>, '
            "<cc:requires "
            'rdf:resource="http://creativecommons.org/ns#ShareAlike"/>, '
            "<cc:requires "
            'rdf:resource="http://creativecommons.org/ns#Attribution"/>, '
            "<cc:requires "
            'rdf:resource="http://creativecommons.org/ns#Notice"/>]'
        )
>       assert str(valid_anchors) == expected_anchors
E       assert ('[<cc:permits rdf:resource="http://creativecommons.org/ns#DerivativeWorks"/>, '\n '<cc:permits rdf:resource="http://creativecommons.org/ns#Reproduction"/>, '\n '<cc:permits rdf:resource="http://creativecommons.org/ns#Distribution"/>, '\n '<cc:jurisdiction '\n 'rdf:resource="http://creativecommons.org/international/ch/"/>, <foaf:logo '\n 'rdf:resource="https://i.creativecommons.org/l/by-nc-sa/2.5/ch/88x31.png"/>, '\n '<foaf:logo '\n 'rdf:resource="https://i.creativecommons.org/l/by-nc-sa/2.5/ch/80x15.png"/>]') == ('[<cc:permits rdf:resource="http://creativecommons.org/ns#DerivativeWorks"/>, '\n '<cc:permits rdf:resource="http://creativecommons.org/ns#Reproduction"/>, '\n '<cc:permits rdf:resource="http://creativecommons.org/ns#Distribution"/>, '\n '<cc:jurisdiction '\n 'rdf:resource="http://creativecommons.org/international/ch/"/>, <foaf:logo '\n 'rdf:resource="https://i.creativecommons.org/l/by-nc-sa/2.5/ch/88x31.png"/>, '\n '<foaf:logo '\n 'rdf:resource="https://i.creativecommons.org/l/by-nc-sa/2.5/ch/80x15.png"/>, '\n '<cc:legalcode '\n 'rdf:resource="http://creativecommons.org/licenses/by-nc-sa/2.5/ch/legalcode.de"/>, '\n '<dc:source '\n 'rdf:resource="http://creativecommons.org/licenses/by-nc-sa/2.5/"/>, '\n '<dc:creator rdf:resource="http://creativecommons.org"/>, <cc:prohibits '\n 'rdf:resource="http://creativecommons.org/ns#CommercialUse"/>, '\n '<cc:licenseClass rdf:resource="http://creativecommons.org/license/"/>, '\n '<cc:requires rdf:resource="http://creativecommons.org/ns#ShareAlike"/>, '\n '<cc:requires rdf:resource="http://creativecommons.org/ns#Attribution"/>, '\n '<cc:requires rdf:resource="http://creativecommons.org/ns#Notice"/>]')
E         - [<cc:permits rdf:resource="http://creativecommons.org/ns#DerivativeWorks"/>, <cc:permits rdf:resource="http://creativecommons.org/ns#Reproduction"/>, <cc:permits rdf:resource="http://creativecommons.org/ns#Distribution"/>, <cc:jurisdiction rdf:resource="http://creativecommons.org/international/ch/"/>, <foaf:logo rdf:resource="https://i.creativecommons.org/l/by-nc-sa/2.5/ch/88x31.png"/>, <foaf:logo rdf:resource="https://i.creativecommons.org/l/by-nc-sa/2.5/ch/80x15.png"/>, <cc:legalcode rdf:resource="http://creativecommons.org/licenses/by-nc-sa/2.5/ch/legalcode.de"/>, <dc:source rdf:resource="http://creativecommons.org/licenses/by-nc-sa/2.5/"/>, <dc:creator rdf:resource="http://creativecommons.org"/>, <cc:prohibits rdf:resource="http://creativecommons.org/ns#CommercialUse"/>, <cc:licenseClass rdf:resource="http://creativecommons.org/license/"/>, <cc:requires rdf:resource="http://creativecommons.org/ns#ShareAlike"/>, <cc:requires rdf:resource="http://creativecommons.org/ns#Attribution"/>, <cc:requires rdf:resource="http://creativecommons.org/ns#Notice"/>]
E         + [<cc:permits rdf:resource="http://creativecommons.org/ns#DerivativeWorks"/>, <cc:permits rdf:resource="http://creativecommons.org/ns#Reproduction"/>, <cc:permits rdf:resource="http://creativecommons.org/ns#Distribution"/>, <cc:jurisdiction rdf:resource="http://creativecommons.org/international/ch/"/>, <foaf:logo rdf:resource="https://i.creativecommons.org/l/by-nc-sa/2.5/ch/88x31.png"/>, <foaf:logo rdf:resource="https://i.creativecommons.org/l/by-nc-sa/2.5/ch/80x15.png"/>]

link_checker/tests/test_utils.py:272: AssertionError
------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------
None
Warnings:
  Anchor uses name        <a name="hello">without href</a>
============================================================================== short test summary info ==============================================================================
FAILED link_checker/tests/test_utils.py::test_get_scrapable_links - assert ('[<cc:permits rdf:resource="http://creativecommons.org/ns#DerivativeWorks"/>, '\n '<cc:permits rdf:res...
=========================================================================== 1 failed, 28 passed in 10.00s ===========================================================================

Expectation

Should work.

Environment

The pip installed packages are the same:

pipenv run pip list --format=columns
Package            Version    Location
------------------ ---------- ----------------------------------------------
appdirs            1.4.4
attrs              20.2.0
beautifulsoup4     4.9.3
black              19.10b0
certifi            2020.6.20
chardet            3.0.4
click              7.1.2
flake8             3.8.4
gevent             20.9.0
greenlet           0.4.17
grequests          0.6.0
idna               2.10
importlib-metadata 2.0.0
iniconfig          1.1.1
junit-xml          1.9
link-checker       0.1.0      /Users/tim/CreativeCommons/git/cc-link-checker
lxml               4.6.1
mccabe             0.6.1
packaging          20.4
pathspec           0.8.0
pip                20.2.4
pluggy             0.13.1
py                 1.9.0
pycodestyle        2.6.0
pyflakes           2.2.0
pyparsing          2.4.7
pytest             6.1.2
regex              2020.10.28
requests           2.24.0
setuptools         50.3.2
six                1.15.0
soupsieve          2.0.1
toml               0.10.1
typed-ast          1.4.1
urllib3            1.25.11
wheel              0.35.1
zipp               3.4.0
zope.event         4.5.0
zope.interface     5.1.2

Resolution