Tools to create and expose a database of purls (Package URLs). This project is sponsored by NLnet project https://nlnet.nl/project/vulnerabilitydatabase/ and nexB for https://www.aboutcode.org/ Chat is at https://gitter.im/aboutcode-org/discuss
On certain large purldb instances, when using the api/resources/filter_by_checksums endpoint via the scancode.io map_deploy_to_develop pipeline, the match_to_purldb_resource step is very slow and can take +30 hours to complete.
After debugging, we found that the two biggest reasons for the slowness are:
Ordering of Resources, a lot of CPU time is spent ordering resources from a query
Decoding large JSON fields, a lot of time is spent parsing JSON fields if they are too big, like the history field on Package
Immediate solutions that come to mind:
Remove ordering for Resources
Create proper History model for Package, expedient thing would be to empty history json field. Look into using .only() on queries.
On certain large purldb instances, when using the
api/resources/filter_by_checksums
endpoint via the scancode.iomap_deploy_to_develop
pipeline, thematch_to_purldb_resource
step is very slow and can take +30 hours to complete.After debugging, we found that the two biggest reasons for the slowness are:
Immediate solutions that come to mind:
.only()
on queries.