aboutcode-org / scancode-toolkit

:mag: ScanCode detects licenses, copyrights, dependencies by "scanning code" ... to discover and inventory open source and third-party packages used in your code. Sponsored by NLnet project https://nlnet.nl/project/vulnerabilitydatabase, the Google Summer of Code, Azure credits, nexB and others generous sponsors!
https://github.com/aboutcode-org/scancode-toolkit/releases/
2.08k stars 537 forks source link

Slow index creation on Python 3.13 under some circumstances #3921

Open stefan6419846 opened 1 week ago

stefan6419846 commented 1 week ago

Description

I am currently trying to understand some performance issues inside my own wrapper of SCTK which seems to be limited to the wrapper, but somehow originates from licensedcode.cache.

In this specific case, I use the SCTK API to retrieve the copyrights of a directory. To speed things up, I rely on joblib to spawn four parallel workers, each working on a separate file. This has been working without issues until Python 3.12, but for Python 3.13 I see much slower execution times the first time SCTK is called after the installation, id est most likely when the license index is being built.

During my testing, I observed that Python 3.12 would take about 12-13 seconds for the whole execution, while on Python 3.13 the first call would take nearly 4 minutes (and the second call 12 seconds). In some cases (involving a subprocess call) I would even run into lock file timeouts on Python 3.13 during index generation, id est the index taking more than 6 minutes to build.

How To Reproduce

Unfortunately, I cannot give a standalone example here, but only a reference to https://github.com/stefan6419846/license_tools/tree/parallel which contains the offending code.

System configuration

AyanSinhaMahapatra commented 1 week ago

@stefan6419846 thank you for reporting this issue.

From https://www.python.org/downloads/ python 3.13 seems to be in prerelease still, which is why we still have not started testing SCTK with python 3.13 and this is also not a supported python version for now.

It's nice to see you're able to atleast use SCTK with python3.13 without failing though, and we will update this issue once therre are stable releases of python3.13 and we start testing SCTK with the same. Usually it takes a while as we also need all our dependencies (specifically pyahocorasick, lxml, intbitset etc which are not pure-python) to start building wheels for python3.13, to release SCTK archives.

stefan6419846 commented 1 week ago

I am aware that Python 3.13 is still only available as a release candidate, but the release managers recommend library/package maintainers to start testing compatibility at least when the first RC is available. As I am maintaining a package/library built upon SCTK and SCTK has a library interface as well, I already tested basic compatibility (although starting in earlier stages already, while I waited with reporting this here until I have been able to verify this on the RC as well). From my experience, installing SCTK on a mostly basic Ubuntu 22.04 with Python 3.13 works without any real issues - including compiling the binary dependencies.