Community reconstruction of the legacy JSON NVD Data Feeds. This project uses and redistributes data from the NVD API but is neither endorsed nor certified by the NVD.
114
stars
16
forks
source link
Project Update: Introducing Weekly Cache Rebuilds to Self-Heal from Inconsistent API Behavior #16
this month, this repository will celebrate its first birthday. Over the past year, we collected actual and historical NVD vulnerability data using git. This gave both the community and us the opportunity to transparently observe changes in API response behavior and gain valuable insights on the dataset.
In the past month, we performed in-depth analyses to track some data inconsistencies between recent NVD API responses and our dataset. Hash-based comparisons and json diffs between each CVE record in this repo and the NVD yielded that, for some unknown reasons, our cache did not receive all record changes from the NVD.
Together with the nice people at NIST, we were able to to pinpoint some issues in our cache, but also the NVD. Sometimes, the record differences were small and did not significantly affect data actuality: We found that we based our synchronization bot behavior on the assumption that each change in API responses results in a CVE record update. This, in turn, would imply an update of the modification timestamp in records, which is the value developers shall use to keep their local copy up to date. However, this assumption is obviously false because when the NVD changes output formats, as they did with their switch to CVE 5.0, the records do not effectively receive an update. This makes sense when thinking about, e.g., changes in character encodings used by the API endpoints. A lot of inconsistencies we observed between the cache and recent NVD responses originated from exactly this switch in character encodings, which did not propagate in the cache as the records were not modified.
Yet, we also found clues that for some data fields in CVE records, NVD procedures may fail to update the modification timestamp, which lead to data inconsistencies as well. We shared our insights with the NVD and leave the rest to their database team :-).
Either way, we decided to introduce a self-healing mechanism that re-establishes data consistency between the NVD API and this repository to mitigate any future effects originating from false assumptions or API hiccups: From now on, our bot will periodically perform complete cache-rebuilds to pull in fresh API responses. The rebuilds occur at Sun, 02:30:00Z each week.
I just triggered a manual cache rebuild. Commit db95377 fixes all known inconsistencies in this repository, e.g., #15.
Hello everybody,
this month, this repository will celebrate its first birthday. Over the past year, we collected actual and historical NVD vulnerability data using git. This gave both the community and us the opportunity to transparently observe changes in API response behavior and gain valuable insights on the dataset.
In the past month, we performed in-depth analyses to track some data inconsistencies between recent NVD API responses and our dataset. Hash-based comparisons and json diffs between each CVE record in this repo and the NVD yielded that, for some unknown reasons, our cache did not receive all record changes from the NVD.
Together with the nice people at NIST, we were able to to pinpoint some issues in our cache, but also the NVD. Sometimes, the record differences were small and did not significantly affect data actuality: We found that we based our synchronization bot behavior on the assumption that each change in API responses results in a CVE record update. This, in turn, would imply an update of the modification timestamp in records, which is the value developers shall use to keep their local copy up to date. However, this assumption is obviously false because when the NVD changes output formats, as they did with their switch to CVE 5.0, the records do not effectively receive an update. This makes sense when thinking about, e.g., changes in character encodings used by the API endpoints. A lot of inconsistencies we observed between the cache and recent NVD responses originated from exactly this switch in character encodings, which did not propagate in the cache as the records were not modified.
Yet, we also found clues that for some data fields in CVE records, NVD procedures may fail to update the modification timestamp, which lead to data inconsistencies as well. We shared our insights with the NVD and leave the rest to their database team :-).
Either way, we decided to introduce a self-healing mechanism that re-establishes data consistency between the NVD API and this repository to mitigate any future effects originating from false assumptions or API hiccups: From now on, our bot will periodically perform complete cache-rebuilds to pull in fresh API responses. The rebuilds occur at
Sun, 02:30:00Z
each week.I just triggered a manual cache rebuild. Commit db95377 fixes all known inconsistencies in this repository, e.g., #15.
Cheers :clinking_glasses: