aboutcode-org / purldb

Tools to create and expose a database of purls (Package URLs). This project is sponsored by NLnet project https://nlnet.nl/project/vulnerabilitydatabase/ and nexB for https://www.aboutcode.org/ Chat is at https://gitter.im/aboutcode-org/discuss
https://purldb.readthedocs.io/
35 stars 23 forks source link

Performance issues when using `api/resources/filter_by_checksums` #551

Open JonoYang opened 2 days ago

JonoYang commented 2 days ago

On certain large purldb instances, when using the api/resources/filter_by_checksums endpoint via the scancode.io map_deploy_to_develop pipeline, the match_to_purldb_resource step is very slow and can take +30 hours to complete.

After debugging, we found that the two biggest reasons for the slowness are:

Immediate solutions that come to mind:

JonoYang commented 23 hours ago

Restoring ordering by id and creating indices on id for Resource and Package did not help the slow API performance.