fkie-cad / nvd-json-data-feeds

Community reconstruction of the legacy JSON NVD Data Feeds. This project uses and redistributes data from the NVD API but is neither endorsed nor certified by the NVD.
109 stars 15 forks source link

Commit date does not match date in commit message #17

Closed maringuu closed 5 months ago

maringuu commented 5 months ago

For example:

commit b9f38cb9d3321a6c2131d7f533dde0f1b6d79a03
Author: cad-safe-bot <cad-safe-bot@protonmail.com>
Date:   2024-03-01 07:00:29 +0000

    Auto-Update: 2024-03-01T07:00:25.595145+00:00

The difference is minor (4 seconds) but still, this is inconsistent. In addition I'd argue that the high precision in the commit message is a bit too much and not meaningful.

I stumbled across this while working on a script to create releases. The script is given a timestamp to determine what commit to use as data source.

rhelmke commented 5 months ago

This is because the auto-update timestamp is the bot's execution timestamp and not the commit timestamp. This timestamp is used to determine an exact point in time that is used to poll the NVD for changes. (Give me all changes until X)

maringuu commented 5 months ago

This is because the auto-update timestamp is the bot's execution timestamp and not the commit timestamp.

Exactly, that was my poorly worded point. I think both should be the same. I do not think that the bot's execution timestamp should be part of the repository. From my perspective, it is an implementation detail that should not leak into the database (i.e. git repo). I am aware that this is nitpicking, but it bothered me and is trivial to change.

rhelmke commented 5 months ago

We and other users have seen that the NVD has race conditions when we poll data in pagination. This means that during the update process, data may change in the NVD. The delta between exec timestamp and actual commit tells us how long the syncing process took. What if there are any API hiccups that caused the process to slow down? Could this lead to inconsistencies? This data point can help us (and also other users) to reconstruct what happend and actually find anomalies.

I strongly disagree with you. The execution timestamp is a useful data point in this repo, because it establishes transparency to the users when something is happening on our servers. It has already been useful to identify synchronization problems and helped to reconstruct what happened when the NVD went south. Locking that data away would maybe not truncate our possibilities, but everyone else's.

Thus, for the sake of reproducibility and transparency, it should be part of this repo.

maringuu commented 5 months ago

Thank you for the nice explanation! I agree with you now.

I was under the impression that one could simpy query the NVD API with a maximum lastModified date and use the results to create a commit. I didn't think about race conditions. Also, I didn't think that the NVD could be non-deterninistic and return a differen result for the same query at two different times (with a fixed lastModified date ofc) .

But since you are not the NVD, I now totally see that you cannot make such assumptions about the API and the query time is a valuable datapoint for this mirror.

Again, thank you for taking time to explain what was wrong about my nitpicky issue!