cncf / clomonitor

CLOMonitor is a tool that periodically checks open source projects repositories to verify they meet certain project health best practices
https://clomonitor.io
Apache License 2.0
112 stars 73 forks source link

Fragility in the Trademark Disclaimer check; does not support dynamic web sites #1194

Open adamdmharvey opened 1 year ago

adamdmharvey commented 1 year ago

The check for the Trademark Disclaimer works quite well for static web sites, but we should identify possible ways to improve it for dynamic web sites (i.e., Docusaurus/React/Angular).

The current check identifies the web site in the GitHub repo config, downloads that site, and looks for a pair of regular expression checks to identify that the Linux Foundation disclaimer is included in the text:

https://clomonitor.io/docs/topics/checks/#trademark-disclaimer

The Backstage project was failing this check, even though the site is a) listed in the GitHub repo and b) does properly display the disclaimer.

https://backstage.io

Repo Web Site
image image

The main page is delivered as a React-based app, and as a result, downloading the static page results in only some basic HTML that downloads JavaScript, but it itself does not include the Linux Foundation regular expression strings. Thus the project is marked as not having the trademark disclaimer, even though it does.

As a result I fired an exemption via the .clomonitor.yml file into the project via https://github.com/backstage/backstage/pull/18916, but perhaps we could brainstorm about other ways to improve this check. May also tangentially relate to the other issue in this repo about improving how to align that the relevant trademarks have been properly handed over to the foundation (https://github.com/cncf/clomonitor/issues/33).

adamdmharvey commented 1 year ago

I also wonder if it could be helpful to note in the text of the check that it only checks static web sites, and if you have a dynamic, pointing the user to the exception information? (to help repeat what I did for other projects) Happy to contribute that if it's a pattern the project supports?

tegioz commented 1 year ago

Hi @adamdmharvey 👋

Thanks for raising this issue!

We have a solution implemented for this problem actually 🙂 We've an alternative version of this check that instead of checking the HTML document for the trademark pattern, renders the site using a headless browser. We (the maintainers of CLOMonitor) also maintain Artifact Hub, which was in a similar situation (React based web application).

The problem is that relying on the headless browser makes the check considerably slower, so we were holding on a bit before deploying it.

But we'll run some more tests and reconsider it 😉

tegioz commented 1 year ago

I also wonder if it could be helpful to note in the text of the check that it only checks static web sites, and if you have a dynamic, pointing the user to the exception information? (to help repeat what I did for other projects) Happy to contribute that if it's a pattern the project supports?

This would be great until we have the other solution deployed, so if you'd like to contribute we'd really appreciate it 😄 (we are also using an exemption in Artifact Hub, as in cases like this is a great fit).

tegioz commented 1 year ago

BTW on a completely unrelated note, a few months ago we added support for Backstage plugins to Artifact Hub. It would be awesome to see the Backstage plugins catalog listed on artifacthub.io 😇

/cc @castrojo