cve-search / vulnerability-lookup

Vulnerability Lookup facilitates quick correlation of vulnerabilities from various sources, independent of vulnerability IDs, and streamlines the management of Coordinated Vulnerability Disclosure (CVD).
https://cve-search.github.io/vulnerability-lookup/
GNU Affero General Public License v3.0
99 stars 13 forks source link

Adding JVN feed source #67

Closed adulau closed 3 weeks ago

adulau commented 1 month ago
Rafiot commented 1 month ago

Alright, so there are a lot of interesting stuff in this database, and it will not be super straightforward to implement because all of it is XML, and we probably (?) should convert all that to JSON to be consistent with the rest of the app.

Few notes on the different feeds to import:

The las two are pretty much a duplicate of that: https://nvd.nist.gov/products/cpe so we can ignore them.

Rafiot commented 4 weeks ago

Notes for implementing the importer:

Feedparser parses the RDF files just fine

import feedparser
updated = feedparser.parse("https://jvndb.jvn.jp/en/rss/jvndb.rdf")

# Get the entries in a list
len(updated['entries'])

The URL in the snippet above contains the last 500 updates (from new to old), it is enough to go back to an update in 2022 so we can use this URL for upgrading the feed but the initial import will need to iterate over the yearly files:

https://jvndb.jvn.jp/en/rss/years/jvndb_2002.rdf
...
https://jvndb.jvn.jp/en/rss/years/jvndb_2024.rdf

Note that the URL for new vulns (https://jvndb.jvn.jp/en/rss/jvndb_new.rdf) only contains the new entries and not the updated. It cannot be used in our case.

Rafiot commented 4 weeks ago

Notes for importing the details:

The files provided are not RSS, so we cannot use feedparser there.

https://jvndb.jvn.jp/en/feed/detail/jvndb_detail_2002.rdf
...
https://jvndb.jvn.jp/en/feed/detail/jvndb_detail_2024.rdf

In this case, the XML is pretty simple, so we can use xmltodict to make it a much simpler to handle data blob:

import xmltodict

details = xmltodict.parse(open('jvndb_detail_2024.rdf').read())

# This key is a list of details
details['VULDEF-Document']['Vulinfo']

This dataset doesn't offer a way to get the updated entries and it seems all the files are updated in batches once in a while, so we probably need to fetch them all, see which ones changed, and re-import the whole files accordingly.

Rafiot commented 3 weeks ago

The more I look, the more the dataset is confusing. As of now (2024-08-26 14:00):

jbmaillet commented 3 weeks ago

Hi there. I am a former user of cve-search, and since 2019 I switched to go-cve-dictionary from the vuls.io stack (see also https://vuls.biz/en for a commercially supported version with added features). All this is developed in Japan by Japanese people, and hence supporting the JVN natively. I am not a Japanese user, and hence I am only interested in the JVN as a "nice to have". FWIW, here are a few things that I learned over time that may help you.

If a JVN Id doe not refer to a CVE Id, the Japanese go-cve-dictionary developers chose to simply and completely ignore it when requesting the CVE that match a CPE. This says a lot.

JVN primary identifiers are not CPE , though I am not sure exactly what they are instead, and would probably not be able to read a spec (AI translation?) What we can see, for a Linux kernel CVE for instance, is the use of "Linux kernel" and から ("kara", "from") and まで ("made", "untill") for versions. This is human friendly, but not machine parsable. Note that the CPE are available in go-cve-dictionary JVN data but...

...but another thing regarding CPE is that if the JVN clone these from the NVD in its data (see some JSON replies when querying a go-cve-dictionary), these seems to be the NVD CPE at the time of CVE creation. And since the one of the many terrible issues with CPE being that some new ones are constantly created by a human and hence error prone process, and some other ones constantly deprecated, the JVN CPE may not match the current NVD CPE.

The JVN consist of, by my pure guestimation, 95% NVD translation in Japanese language. But it also contains some issues for Japanese specific industrial actors, such as Hitachi, NEC, Mitsubishi and the like, probably NTT Docomo, and maybe other Japanese providers that we have never heard about. But that may be important for some users. As is often the case, Japan can be the hidden faces of the moon, and has its Galápagos syndrome - see Wikipedia. But that's why we like it, don't we?

A thing that is nice, but I don't know if it can be used automatically nor how, it that it maintain an up-to-date list of contacts (a mail address, maybe a web site, or a git repo for OSS) for every "CPE like" identifier, with a defined process for timely check that the contact is in fact, well, contact-able, and to update it or declare it as deprecated, maybe search for another one etc. I do not know the details, it is probably very resource consuming, but Japanese people follow processes with the most extreme diligence so I'd trust it. And anyway, this looks to me like a fantastic idea in these time of supply chain and accountability concerns, for ex with regards to regulations like it or not (ex European CRA).

Being usable for Japanese mother tongue users is very, very important for Japanese users. I studied the language for 3 years in evening classes and intensive workshops, and did 3 long stays in the archipelago (a long time ago, I forgot almost everything). For various sociological reasons that I won't delve into, the English proficiency level is, well, not very good, except for some rare and curious individuals. Even in multinational companies with presence in Europe, the US etc. Ex I was amazed by the amount of Japanese translated printed software books I could find there. And you may have noticed the quasi-absence of Japanese users on stackoverflow or other major website, or the rarity of Japanese software considering the country economic importance - with the noticeable exception of Jenkins. Japan have its own local ecosystems.

Apart form that, as of today, go-cve-dictionary as excellent CPE to CVE matching. This was my main motivation for switching from cve-search (plus I do not need a web interface), to the cost of performances, but better safe than sorry. It also support replication of the "MITRE" (cve.org) database with its rich CVE Record version 5.1 data. It could be a good idea for cve-search to play with it and look at its code. It's 100% Go. Some other parts of the vuls.io are interesting too, such as go-cpe-dictionary, with fuzzy CPE search.

Hope this helps.

adulau commented 3 weeks ago

Thanks a lot for the insightful information @jbmaillet - vulnerability-lookup is a complete new beast with actual multi-source support including CVE.org, NVD and many others. We might indeed extend the CPE part in vulnerability to support the actual "diversity" of interpretation. The contact point for CPE sounds like a great idea.