aboutcode-org / commoncode

A library of common functions shared in many other AboutCode projects
3 stars 11 forks source link

Failure of tests/test_paths.py::TestPortablePath::test_safe_path_posix_style_chinese_char #56

Open eclipseo opened 1 year ago

eclipseo commented 1 year ago

Environment:

The following test fails:

___________ TestPortablePath.test_safe_path_posix_style_chinese_char ___________

self = <test_paths.TestPortablePath testMethod=test_safe_path_posix_style_chinese_char>

    def test_safe_path_posix_style_chinese_char(self):
        test = paths.safe_path(b'/includes/webform.compon\xd2\xaants.inc/')
        expected = 'includes/webform.componNSnts.inc'
>       assert test == expected
E       AssertionError: assert 'includes/web...mponS_nts.inc' == 'includes/web...mponNSnts.inc'
E         - includes/webform.componNSnts.inc
E         ?                        -
E         + includes/webform.componS_nts.inc
E         ?                         +

tests/test_paths.py:74: AssertionError

tests/test_paths.py::TestPortablePath::test_safe_path_posix_style_chinese_char
pombredanne commented 1 year ago

@eclipseo Thanks! We designed these tests for a reason, so they could break as needed, and this looks like this need is now!

Can you tell what is your processor architecture? And what is your locale and filesystem encoding?

Is there a way to get a Fedora Rawhide container image of sorts with Python 3.12.0~rc1 tor reproduce the failure?

Side note: It seems from the trails of questions and issues your leave behind that you are porting ScanCode to Fedora! ... this is awesome!

eclipseo commented 1 year ago

My arch is x86_64 but will be testing on s390x, ppc64le and aarch64.

I think you can images on https://registry.fedoraproject.org/, look for Fedora 40.

For now I have most of the dependencies prepared, but I still have issues with testing extractcode and one other similar, I think I'll file bug for help.

For now, we use Debian's licencecheck in our review tool. One of the legal people from Redhat suggested to replace it with askalano, but as the initial packager for askalano, which I am using for license detection in Golang packaging, it is no better. So I'm looking for alternative, in Python preferably, to plug it into my tools, and potentially into the official review tool if it gives good results

So far it's been good, askalano has trouble when a license file has multiple licenses and also with linking exception.

Fedora has been switching to SPDX and we have more than 25000 packages to go through. We can't automatically convert from the previous notation to SPDX because we called stuff MIT/BSD/CC-BY without specifying the Version contrary to SPDX. And we have new rules for "effective analysis" so basically we need to reanalyze the code base.

pombredanne commented 1 year ago

@eclipseo this is awesome!

You wrote:

I think you can images on https://registry.fedoraproject.org/, look for Fedora 40.

Ideally we should add a basic smoke test for a Fedora container in https://github.com/nexB/skeleton/blob/main/azure-pipelines.yml and use it across all the repos!

Some other comments:

We have a few things that should be of interest to you:

All-in-all I would like to help!

eclipseo commented 1 year ago

@pombredanne If you can help about this: https://github.com/nexB/extractcode/issues/51 This is my remaining blocking part.

eclipseo commented 1 year ago

So it works on Fedora 38 and 39 with

export LC_ALL=C.UTF-8

before the test, but not Fedora 40.

pombredanne commented 2 months ago

@eclipseo the latest release should pass all tests up to 3.11 and I am adding 3.12 support next. Closely related, I have been hitting this bug https://github.com/jawah/charset_normalizer/issues/520