Snyk & Lighthouse vuln. detection improvements

aviadhahami commented 4 years ago

Hi, Today I looked at the way is-website-vulnerable is inspecting websites, and realized it was invoking the lighthouse npm package :taco:;

When I looked at how lighthouse is inspecting the required websites for vul., I've encountered two issues (IMO):

Maintainability: Since Feb 6, 2018 @paulirish and @patrickhulce needed to manualy approve an automatic PR on a weekly basis (see attached ss at the bottom). Not only that this procedure can become a pain for the maintainers after a while, but this "CR" is also redundant since we trust Snyk's integrity (otherwise their vuln. DB wouldn't be in use, right?). Also - given that we're not living in Utopia and life happens, there's a good chance that this procedure will be forgotten/not maintained with time. If this will happen the whole "vuln. testing feature" will be rendered useless.
Quality of information Assuming that we're ok with @snyk-bot 's weekly PR, the next thing in the chain is the "quality" of our tests & the vuln DB. Lighthouse is testing the given package versions against the snyk DB snapshot, however - what does it contain? what are we actually testing against? Counting the entries, we conclude at 20 libraries and 86 possible vuln. (as of #9929). A quick check of Snyk's vuln. DB reveals ~1836 npm related vulnerabilites, out of which ~615 are from 2019. This means that we test only for ~4.6% (or ~14% w.r.t 2019 vuln.) of the known vuln. DB. In the past, there was a fetching script in use, but afaik it is not anymore [tho it still resides in the repo (see #5162 and its follow-ups)].

What is the motivation or use case for changing this? & How is this beneficial to Lighthouse?

@paulirish and @patrickhulce won't need to "code review" version changes (which are already reviewed by Snyk themselves as a part of the product) as the vuln. DB can always be "fresh". This yields better maintainability and robustness.
Allowing bigger vuln. DB will increase the vulnerabilities tested for a given pacakge by ~X4. Better web! :tada:

I'll be happy to PR this, just curious what do you guys think and whether there's a need for that improvement. [also - sry for long post]

Tagging @aviadatsnyk as he was the one allowing the automatic PR's and suggested the fetching script deletion. Also - cool name :zap:

Example PR log for @snyk-bot db update

patrickhulce commented 4 years ago

Thanks for filing @aviadhahami ! I'm not sure what your specific proposal is though?

In the past, there was a fetching script in use, but afaik it is not anymore

AFAIK snyk ran this script with some modifications to file the PR each week? But I could be wrong and they've decided to implement their own now.

This means that we test only for ~4.6% (or ~14% w.r.t 2019 vuln.) of the known vuln. DB.

True, but there are diminishing returns to the additional libraries included, so our coverage by usage is significantly higher. The current snapshot targets some of the most popular libraries and there's no point including the vulnerability data if we are unable to detect the library's usage. (Must be in https://www.npmjs.com/package/js-library-detector).

FWIW many of the npm vulnerabilities are also aimed at server-side or build-time libraries which don't really apply to the frontend use case here in Lighthouse.

aviadhahami commented 4 years ago

Hi @patrickhulce - thanks for replying.

AFAIK snyk ran this script with some modifications to file the PR each week? But I could be wrong and they've decided to implement their own now.

I am not sure whether Snyk is running this script in order to create the PR. It makes sense - but can we confirm it? @aviadatsnyk

(Must be in https://www.npmjs.com/package/js-library-detector).

This actually took most of my time. After thoroughly reading the repos and scripts, I can say that the whole feature (vuln. detection) is only as good as js-library-detector. Unfortunately - js-library-detector(denoted lib-detector) is very limited. The way lib-detector is testing for frameworks and versions is using a set of framework-related test functions (see the previous link for easier understanding). The unfortunate part is the version extractions process. Many times lib-detector will yield the relevant frameworks - but null for its' versions. This renders the rest of the pipeline useless, as we do not have vuln. for a null version.

I'm not sure what your specific proposal is though?

As for this, I think I'll dive into lib-detector first to see if I can improve the robustness of the tests (found a lot of uncovered cases, SSR for example) and version fetching. I'll update here with progress :)

FWIW many of the npm vulnerabilities are also aimed at server-side or build-time libraries which don't really apply to the frontend use case here in Lighthouse.

I do not know if many, but I can claim that many are also targeting clients :smiley:

patrickhulce commented 4 years ago

I think I'll dive into lib-detector first to see if I can improve the robustness of the tests

Sounds like a great way to help a lot of folks out 👍 😃

brendankenny commented 3 years ago

Closing for now, as it seems like future progress has moved up the stack to js-library-detector. If there are more ideas on improving snyk integration and the vulnerable library audit, feel free to open more issues!

GoogleChrome / lighthouse

Snyk & Lighthouse vuln. detection improvements #9958

Example PR log for @snyk-bot db update