deepset-ai / haystack

:mag: AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
https://haystack.deepset.ai
Apache License 2.0
16.68k stars 1.83k forks source link

Implement automated vulnerability checks and license compliance checks #859

Closed Timoeller closed 3 years ago

Timoeller commented 3 years ago

We can integrate https://snyk.io/ Unfortunately the license compliance is not free for OSS. Only vulnerability checks.

lalitpagaria commented 3 years ago

I used this for my project. Do not recommend this for vulnerabilities check either. Auto raising PR to fix vulnerabilities seems not working. It says not able to find the fix, but github simple scan suggested fix to me.

Also getting lot of emails about buying their commercial product.

Timoeller commented 3 years ago

Hey Lalit, thanks for the insights. I also get a lot of inbound mail for their product...

Can you recommend other tools for vulnerability checks or license compliance?

lalitpagaria commented 3 years ago

I am also searching for license compliance tool. Thinking for writing one which will fetch license information from pypi for each dependencies and analyse against OSS approved license. Obviously it very hard to check if someone using non compliant code in PR. For that better to add code commit guideline along with mechanism for people to report violation.

For vulnerability so far I found these two tools (something is better than nothing) -

  1. https://github.com/pyupio/safety -> To check dependencies (Free version update databased every month)
  2. https://github.com/PyCQA/bandit -> To find common security issue in python code

These tool can be easily integrated with CI by simply pip install, not need to signup for them. Check them out and share feedback.

Timoeller commented 3 years ago

Perfect, thanks for the insights. @PiffPaffM this might be useful for you as well. We will work on this in the coming sprint and give updates here.

lalitpagaria commented 3 years ago

@PiffPaffM Facebook also released static code analyser Pysa. It is also worth to tryout. I have not tried it yet but soon I will check this out.

Timoeller commented 3 years ago

It might be worth looking into gitlab documentation and what tooling they use for license compliance mgmt.

https://docs.gitlab.com/ee/user/compliance/license_compliance/index.html

lalitpagaria commented 3 years ago

For python they are checking requirements.txt and piplock file. I think we can use their tool via docker image (not tried it yet). Repo source code - https://gitlab.com/gitlab-org/security-products/analyzers/license-finder

PiffPaffM commented 3 years ago

@lalitpagaria: Thanks for the input. I started to try the various options. Here a quick summary of my initial findings:

Vulnerability Checks:

License compliance:

Next steps:

PiffPaffM commented 3 years ago

After some more research I would suggest the following setup:

Vulnerability checks: We need to decide between vulnerabilities within in the project/repository and docker image:

  1. repository: I activated for all relevant repos: "Dependency graph", "Dependabot alerts" and "Dependabot security updates". This way, Github check if all dependencies are up-to-date and if there are any vulnerabilities. The Dependabot will create a PR if action is required.
  2. Docker: A lot of vulnerabilities are cause by the chosen Base Image. Therefore, we need to check the generate docker image.

License compliance: Github comes with "Dependency Insights" https://github.blog/changelog/2019-05-23-dependency-insights/ (to be checked)

lalitpagaria commented 3 years ago

Thanks @PiffPaffM

Docker scanning is pain even my company we tried N number of tools still not fixed on one. We now tested Unikernel which have lesser footprint for security vulnerability. But it is not battle tested yet.

Use docker scan if you build the image locally (https://docs.docker.com/engine/scan/)

This indirectly use Snyk under the hood. I used Snyk, and it reported 20 vulns on Debian based python base image. I don't want to use Alpine based image, already faced many prod issue due to misconfiguration of Alpine based image.

PiffPaffM commented 3 years ago

@lalitpagaria: Thanks for the insights! I think we will stay with this setup for now and see how it will work for us. Happy to share my findings.

Regarding license compliance, I see a few different solutions which need to be tested.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 21 days if no further activity occurs.

tholor commented 3 years ago

Done