endoflife-date / endoflife.date

Informative site with EoL dates of everything
https://endoflife.date
MIT License
2.52k stars 789 forks source link

Fetch CPEs from Repology #5352

Open captn3m0 opened 5 months ago

captn3m0 commented 5 months ago
  1. Many products at endoflife.date already link to repology (see this search for results)
  2. Repology does a lot of the hard work, by using CVEs to link a repology product to its CPE. Recent CPEs are published, and while they aren't available on the API, they should be present in the data dumps. I've filed https://github.com/repology/repology-rs/issues/78 to see if this is feasible upstream.

Here's a simpler TODO:

  1. Setup the repology dataset locally as per https://dumps.repology.org/
  2. Use the repology mapper CSV from below to find the repology identifier for each product.
  3. For each such product, write a query to fetch the corresponding CVE.
  4. Add the CPE back to the product as an identifier. From the CPE definition, we only really want cpe:<cpe_version>:<part>:<vendor>:<product>, since we are linking to the product, and not a specific version or variant.
flo7000 commented 4 months ago

So I've looked more into it and i have some questions:

Request URL: https://services.nvd.nist.gov/rest/json/cpes/2.0//?keywordSearch=python&keywordExactMatch

Count of CPEs found: 9121

Example of what i mean (this is the infoblox WAPI)

$ddi_call "${ddi_api_url}/record:host?name~=ds&_return_fields=ipv4addrs,ipv6addrs,name,aliases,view,zone,ddns_protected,ttl,use_ttl,disable,comment,device_description&$pfix"

flo7000 commented 4 months ago

Added macOS CPE's for a test and to discuss with my colleague tomorrow how the KQL Queries will look like and other things

captn3m0 commented 4 months ago

Using the NVD dataset is a dead-end - I wouldn't go via that route. There's some attempts at making a PURL/CPE mapping, but none of them are as accurate as the approach the repology already takes: Use NVD CVEs (which link to CPEs), and if the same CVE is noticed on a product from another source (such as a distro CVE), link the CPE.

repology identifiers are just like the product names? (example: bun's repology is bun)

for many products on endoflife.date, we have a repology identifier in the metadata. Here's a complete mapping that you can use. the first column is the filename, and the second column is the corresponding repology project.

So apache-http-server.md links to https://repology.org/project/apache/information as an example.

product,repology
alibaba-dragonwell,jdk11-dragonwell-extended
alibaba-dragonwell,jdk11-dragonwell-standard
alibaba-dragonwell,jdk17-dragonwell
alibaba-dragonwell,jdk17-dragonwell-standard
alibaba-dragonwell,jdk8-dragonwell-extended
alibaba-dragonwell,jdk8-dragonwell-standard
amazon-cdk,aws-cdk
angularjs,angular.js
ansible,ansible
apache-airflow,apache-airflow
apache-flink,flink
apache-http-server,apache
argocd,argocd
bellsoft-liberica,liberica-jdk-11
bellsoft-liberica,liberica-jdk-11-full
bellsoft-liberica,liberica-jdk-11-lite
bellsoft-liberica,liberica-jdk-17-full
bellsoft-liberica,liberica-jdk-21-full
bellsoft-liberica,liberica-jdk-8-bin
bellsoft-liberica,liberica-jdk-8-full
bellsoft-liberica,liberica-jdk-full
bellsoft-liberica,liberica-jre-11
bootstrap,bootstrap
bun,bun
cert-manager,cert-manager
cfengine,cfengine
chef-infra-server,chef-infra-server
composer,php:composer
consul,consul
couchbase-server,couchbase-server-community
django,python:django
docker-engine,docker
docker-engine,docker-ce
drupal,php:drupal
drush,drush
elasticsearch,elasticsearch
envoy,envoy-proxy
etcd,etcd
exim,exim
fluent-bit,fluent-bit
go,go
goaccess,goaccess
gorilla,go:github-gorilla-context
gorilla,go:github-gorilla-csrf
gorilla,go:github-gorilla-css
gorilla,go:github-gorilla-handlers
gorilla,go:github-gorilla-mux
gorilla,go:github-gorilla-pat
gorilla,go:github-gorilla-rpc
gorilla,go:github-gorilla-schema
gorilla,go:github-gorilla-securecookie
gorilla,go:github-gorilla-sessions
gorilla,go:github-gorilla-websocket
gorilla,go:gorilla-context
gorilla,go:gorilla-mux
haproxy,haproxy
icinga-web,icingaweb2
isc-dhcp,dhcp
istio,go:istio-api
istio,go:istio-pkg
istio,istio
istio,istioctl
kotlin,kotlin
laravel,php:laravel-framework
mariadb,mariadb
matomo,matomo
meilisearch,meilisearch
memcached,memcached
mongodb,mongodb
mysql,mysql
nginx,nginx
nodejs,nodejs
phpmyadmin,phpmyadmin
postgresql,postgresql
prometheus,prometheus
puppet,puppet
python,python
redis,redis
ruby,ruby
salt,salt
squid,squid
tomcat,tomcat
traefik,traefik
typo3,typo3
unrealircd,unrealircd
varnish,varnish
vue,vue.js
wagtail,python:wagtail
wordpress,wordpress
yarn,yarn
zabbix,zabbix
zookeeper,zookeeper

Same question as above, but for PURL; where do I find the official PURLs?

We have them in the same markdown files under the identifiers key. See endoflife.date/contribute for what goes. I would however say PURLs are out-of-scope for this issue.

That said, there is no such thing as an official PURL. PURL are just an identifier that anyone can generate. In our case, we add them to our pages as and when we find them. See https://github.com/endoflife-date/endoflife.date/pull/5302 for an example, where I went through various installation methods of the product, generated PURLs for them, and added them to the product.

If I have many results, how do I know which is the right one? Or should I just everthing that I think is correct?

Assuming you use the repology dataset from https://dumps.repology.org/ (which is the point of this issue), this should not matter.

I have one last question in this comment: Is it possible to request only one field of "data" from the EOL.date API?

The API is light enough that I don't see the use for this. If you want to get data for multiple products in a single call, we don't support that in the current API, but the next version of the API has an endpoint for this: https://deploy-preview-2080--endoflife-date.netlify.app/api/v1/products/full/. See docs at https://deploy-preview-2080--endoflife-date.netlify.app/docs/api/v1/#/default/get_products_full_ and PR at https://github.com/endoflife-date/endoflife.date/pull/2080

flo7000 commented 4 months ago

Using the NVD dataset is a dead-end - I wouldn't go via that route. There's some attempts at making a PURL/CPE mapping, but none of them are as accurate as the approach the repology already takes: Use NVD CVEs (which link to CPEs), and if the same CVE is noticed on a product from another source (such as a distro CVE), link the CPE.

This means I shouldn't use the API with a regex and instead use the API to look for CVE's and their linked CPE's? (see https://documenter.getpostman.com/view/16438573/UzXKWe99 )

for many products on endoflife.date, we have a repology identifier in the metadata. Here's a complete mapping that you can use. the first column is the filename, and the second column is the corresponding repology project.

So basically the repology identifier is just how the project is called on repology (if it exists)?

I would however say PURLs are out-of-scope for this issue.

So for now I should just focus on the repology and CPE identifiers?

Assuming you use the repology dataset from https://dumps.repology.org/ (which is the point of this issue), this should not matter.

This means that for the repology identifiers i can use this DB that I can import right? Do you know by any chance which DBMS it is?

The API is light enough that I don't see the use for this. If you want to get data for multiple products in a single call, we don't support that in the current API, but the next version of the API has an endpoint for this: https://deploy-preview-2080--endoflife-date.netlify.app/api/v1/products/full/. See docs at https://deploy-preview-2080--endoflife-date.netlify.app/docs/api/v1/#/default/get_products_full_ and PR at https://github.com/endoflife-date/endoflife.date/pull/2080

I meant like getting only one single field with an API call (like just get all identifiers), but thanks for the update about the API, this will be great !

captn3m0 commented 4 months ago

This means I shouldn't use the API with a regex and instead use the API to look for CVE's and their linked CPE's?

Yes, but repology has done this hard work, so we need not reproduce.

So basically the repology identifier is just how the project is called on repology (if it exists)?

Yes

So for now I should just focus on the repology and CPE identifiers?

Yes

Do you know by any chance which DBMS it is?

postgres, with a few extensions.

There's a README: https://dumps.repology.org/README.txt which links to https://github.com/repology/repology-updater#creating-the-database

I have a WIP dockerfile which might be of help:

FROM opensuse/leap:15.6

RUN zypper --non-interactive install postgresql14 postgresql14-libversion postgresql14-contrib zstd
RUN mkdir -p /var/run/postgresql && chown -R postgres /var/run/postgresql

USER postgres
ADD repology-database-dump-latest.sql.zst /tmp/repology-database-dump-latest.sql.zst
RUN /usr/lib/postgresql14/bin/initdb -D /var/lib/pgsql/data
RUN pg_ctl --wait --mode immediate -D /var/lib/pgsql/data start && \
    psql -c "CREATE DATABASE repology" && \
    psql -c "CREATE USER repology WITH PASSWORD 'repology'" && \
    psql -c "GRANT ALL ON DATABASE repology TO repology" && \
    psql --dbname repology -c "GRANT CREATE ON SCHEMA public TO PUBLIC" && \
    psql --dbname repology -c "CREATE EXTENSION pg_trgm" && \
    psql --dbname repology -c "CREATE EXTENSION libversion" && \
    echo "host    all             all             0.0.0.0/0            trust" >> /var/lib/pgsql/data/pg_hba.conf && \
    zstd -d /tmp/repology-database-dump-latest.sql.zst | psql --dbname repology -v ON_ERROR_STOP=1

CMD postgres -D /var/lib/pgsql/data
EXPOSE 5432
captn3m0 commented 4 months ago

The table we need is manual_cpes. I queried against all the known identifiers we had:

SELECT   effname as repology_identifier,  cpe_vendor ,  cpe_product  FROM manual_cpes wHERE effname IN ('kotlin', 'nginx', 'python:django', 'typo3', 'envoy-proxy', 'meilisearch', 'php:laravel-framework', 'prometheus', 'couchbase-server-community', 'php:drupal', 'php:composer', 'mongodb', 'postgresql', 'flink', 'chef-infra-server', 'puppet', 'python:wagtail', 'apache-airflow', 'jdk8-dragonwell-extended', 'jdk8-dragonwell-standard', 'jdk17-dragonwell', 'jdk17-dragonwell-standard', 'jdk11-dragonwell-extended', 'jdk11-dragonwell-standard', 'fluent-bit', 'liberica-jdk-11-full', 'liberica-jdk-11', 'liberica-jdk-11-lite', 'liberica-jdk-17-full', 'liberica-jdk-21-full', 'liberica-jdk-8-full', 'liberica-jdk-8-bin', 'liberica-jdk-full', 'liberica-jre-11', 'etcd', 'aws-cdk', 'dhcp', 'cfengine', 'varnish', 'zabbix', 'mysql', 'wordpress', 'elasticsearch', 'traefik', 'salt', 'matomo', 'ansible', 'exim', 'squid', 'goaccess', 'nodejs', 'istio', 'istioctl', 'go:istio-pkg', 'go:istio-api', 'redis', 'mariadb', 'docker', 'docker-ce', 'bun', 'vue.js', 'unrealircd', 'tomcat', 'cert-manager', 'python', 'go', 'ruby', 'drush', 'argocd', 'bootstrap', 'angular.js', 'consul', 'haproxy', 'yarn', 'icingaweb2', 'phpmyadmin', 'memcached', 'zookeeper', 'go:github-gorilla-context', 'go:github-gorilla-csrf', 'go:github-gorilla-css', 'go:github-gorilla-handlers', 'go:github-gorilla-mux', 'go:github-gorilla-pat', 'go:github-gorilla-rpc', 'go:github-gorilla-schema', 'go:github-gorilla-securecookie', 'go:github-gorilla-sessions', 'go:github-gorilla-websocket', 'go:gorilla-context', 'go:gorilla-mux', 'apache')

Which results in this table (with a filename column in case the repology_identifier is not the same as the filename:

repology_identifier cpe_vendor cpe_product filename
angular.js angularjs angular.js angular
ansible ansible ansible
ansible redhat ansible
ansible redhat ansible_engine
apache apache apache_http_server apache
apache apache http_server
apache-airflow apache airflow
argocd linuxfoundation argo-cd
bootstrap getbootstrap bootstrap
cfengine gnu cfengine
cfengine northern cfengine
cfengine northern.tech cfengine
consul hashicorp consul
couchbase-server-community couchbase couchbase_server couchbase-server
dhcp isc dhcp isc-dhcp
docker docker docker docker-engine
elasticsearch elastic elasticsearch
elasticsearch elasticsearch elasticsearch
envoy-proxy cncf envoy envoy-proxy
envoy-proxy envoyproxy envoy envoy-proxy
etcd etcd etcd
etcd redhat etcd
exim exim exim
exim university_of_cambridge exim
flink apache flink apache-flink
fluent-bit treasuredata fluent_bit
go golang go
go:github-gorilla-websocket gorillatoolkit websocket gorilla
haproxy haproxy haproxy
icingaweb2 icinga icinga_web_2 icinga-web
istio istio istio
kotlin jetbrains kotlin
mariadb mariadb mariadb
matomo matomo matomo
memcached memcached memcached
memcached memcachedb memcached
mongodb mongodb mongodb
mysql mysql mysql
mysql oracle mysql
mysql oracle mysql_server
nginx f5 nginx
nginx igor_sysoev nginx
nginx nginx nginx
nodejs nodejs node.js
nodejs nodejs nodejs
php:composer getcomposer composer composer
php:laravel-framework laravel laravel laravel
phpmyadmin phpmyadmin phpmyadmin
postgresql postgresql postgresql
prometheus prometheus prometheus
puppet puppet puppet
puppet puppetlabs puppet
python python python
python python_software_foundation python
python:django djangoproject django django
redis redis redis
redis redislabs redis
ruby ruby-lang ruby
ruby yukihiro_matsumoto ruby
salt saltstack salt
squid squid squid
squid squid-cache squid
tomcat apache tomcat
tomcat apache_software_foundation tomcat
tomcat apache_tomcat apache_tomcat
traefik containous traefik
traefik traefik traefik
typo3 typo3 typo3
varnish varnish-cache varnish
varnish varnish-cache varnish_cache
varnish varnish-software varnich_cache
varnish varnish-software varnich_cache
varnish varnish-software varnish_cache
varnish varnish_cache_project varnish_cache
wordpress wordpress wordpress
yarn yarnpkg yarn
zabbix zabbix zabbix
zookeeper apache zookeeper

Hopefully this should be good enough for you to file some more PRs. I've also updated the issue's description with a TODO.

flo7000 commented 4 months ago

thanks for the examples, the files and the TODO's! i'm currently setting up the VM for the docker instance and i'll create the PR's soon :)

flo7000 commented 4 months ago

Hello @captn3m0 I've started now with my edits locally and I have a question: should i just "trust" the data from the manual_cpe table and add it like that? because i've found cpe:/a:ansible:ansible in the DB and it doesn't exist on NIST (only cpe:/a:redhat:ansible / cpe:/a:redhat:ansible_engine )

and with the ansible example, there are the same CPEs but for different repology projects

repology | cpe ansible | cpe:/a:redhat:ansible ansible-core | cpe:/a:redhat:ansible

Add it to both files right?

captn3m0 commented 4 months ago

The ansible package includes the ansible-core package internally. We can add to just ansible-core for now, need to think a bit more on this.

flo7000 commented 2 months ago

hello @captn3m0 and @marcwrobel

so I've added more CPEs and I some questions:

what are the next steps here? some products apparently don't have a CPE and some products I'm not sure which one to use. and for example, nokia and google phones have a cpe for each version, what should I do with these? is there anything to do for the API, so that we could get the CPE via EOL-API ?