CVEProject / cvelistV5

CVE cache of the official CVE List in CVE JSON 5 format
616 stars 139 forks source link

`recent_activities.json` not being updated #23

Closed ditatompel closed 9 months ago

ditatompel commented 1 year ago

Hi,

I'm working on small tool to sync new and updated CVEs to my app. To save network bandwidth and disk I/O, I use git to synchronize updates from this repository.

The tool rely on ./cves/recent_activities.json, but I don't see any updates to ./cves/recent_activities.json file even though there are updates under ./cves/[year]/ (since 3 weeks ago).

Is the recent_activities.json file under ./cves/ directory not being updated anymore?

Thank you

hkong-mitre commented 1 year ago

It should be, but you are correct that it is not being updated. I will look into this. Thank you for bringing it to my attention.

Out of curiosity, how are you using the ./cves/recent_activities.json file? Are you using it to find which new and/or updated CVEs should be pushed to your app?

ditatompel commented 1 year ago

Out of curiosity, how are you using the ./cves/recent_activities.json file? Are you using it to find which new and/or updated CVEs should be pushed to your app?

Yes, I use the recent_activities.json to find new and/or updated CVEs (from .delta.new, .delta.updated, and .delta.unknown arrays.

After CVEs was imported, I store .startTime and .stopTime to my database as my "last checkpoint". Then, for my next "import cycle", I ignore any older .stopTime data (based from my "last checkpoint" timestamp) to save CPU and Disk I/O usage.

Thank you!

chickpoint commented 11 months ago

I made the same implementation as @ditatompel, I am only waiting for./cves/recent_activities.json to be up-do-date again to actively start using it. The main reason is that CVE feeds based on RSS are dissapearing or don't have enough information to filter out CVEs based on .containers.cna.affected.product or .containers.cna.affected.product

So I second this.

hkong-mitre commented 11 months ago

This is becoming an important integration point for polling changes in CVEs. I am working on another task that is also making use of the delta portion of recent_activities.json.

As a result, I am updating this. The new design is to provide a ./cves/delta.json, since recent_activies.json was really intended to be a debugging tool during the original implementation and deployment. The new file will retain all of the current data, minus the steps property.

Also, I am looking to add additional data into delta. An example of a work-in-progress:

[
  {
    "fetchTime": "2023-07-24T17:11:00.394Z",
    "durationInMsecs": 2244,
    "delta": {
      "numberOfChanges": 1,
      "new": [
        {
          "cveId": "CVE-2023-3321",
          "githubLink": "https://raw.githubusercontent.com/hkong/cvelistV5/main/cves/2023/3xxx/CVE-2023-3321.json",
          "datePublished": "2023-07-24T17:06:31.093Z",
          "dateUpdated": "2023-07-24T17:06:31.093Z",
          "assignerShortName": "CNA",
          "description": "English description..."
        }
      ],
      "updated": []
    }
  }
  ...
]

The fetchTime is when the CVE REST Service is queried, and along with durationInMsecs provides the window in which the data could have been changed. This is sometimes useful. The githubLink provides a link to the full CVE record in CVE JSON 5.0 format. The other data are what is useful for me from the record.

The idea is to

  1. put generally useful information into the delta.json file itself, so the majority of clients won't need to read in the full record
  2. make the latest delta.json file always available at https://raw.githubusercontent.com/CVEProject/cvelistV5/main/cves/recent_activities.json
  3. provide a GitHub link to the record for clients that can only get the full record from the same host as the delta.json instead of another REST endpoint.

The delta.json will also include recent history, going back 7 days, w possibly more history with some of the data "pruned", so the file won't get so large that it is difficult to parse by some clients.

For your existing workflow, @ditatompel, @chickpoint, would this be more helpful than the recent_activities.json? Are there other "generally useful" data that you think should be there?

chickpoint commented 11 months ago

Maybe it help to explain how I am using the current recent_activities.json, to give some background information. We are monitoring vulnerabilities for software that we use in our organization (to be ISO 27001 compliant). As we are not interested in all CVE's we need to filter them

I will try to paint the general picture in some steps, on how we do this.

  1. Clone the repo
  2. Read date last time the script ran.
  3. Compare dates from recent_activities.json till last run time and then stop.
  4. Loop al found CVE's and filter those based upon vendor and/or product.
  5. If CVE matches filter and/or product extract the data from the responding JSON and parse to XML

So as you can see the recent_activities.json is mainly used to quickly retreive CVE IDs without having to read all JSON files.

The information you proposed has a lot of the needed information, but for me it would be an added value to have the Vendor and Product in the delta.json. This way we can make that quick filter.

For me personally the githublink can be a relative path, as we clone the repo. But I can see that in other scenario's the link can be useful. Nothing that can't be filtered out.

ditatompel commented 11 months ago

This is becoming an important integration point for polling changes in CVEs. I am working on another task that is also making use of the delta portion of recent_activities.json.

As a result, I am updating this. The new design is to provide a ./cves/delta.json, since recent_activies.json was really intended to be a debugging tool during the original implementation and deployment. The new file will retain all of the current data, minus the steps property.

Sounds great! Since the recent_activities.json will grow larger over time, using separated delta records will be much more efficient for me (and maybe for @chickpoint use case too).

Thanks for your hard work @hkong-mitre!

hkong-mitre commented 10 months ago

There is a PR specifically for this now. recent_activities.json is going away, and delta.json and deltaLog.json will be supported features going forward.

After extended internal discussions, the content is a little different than what I noted above. It now looks like the following, which has a richer set of metadata than in recent_activities.json but not repeating too much of the body of the CVE (note the CVE is a dummy test CVE, so none of the URLs would work, but you get the idea):

{
      "cveId": "CVE-1970-0002",
      "cveOrgLink": "https://www.cve.org/CVERecord?id=CVE-1970-0002",
      "githubLink": "https://raw.githubusercontent.com/CVEProject/cvelistV5/main/cves/1970/0xxx/CVE-1970-0002.json",
      "dateUpdated": "1970-01-01T01:02:00.000Z"
}

The PR is https://github.com/CVEProject/cvelist-bulk-download/pull/13. Please feel free to review/comment in the PR itself. After a brief review period, the code for updating this repository will be pushed to production and the delta JSON files will be available in the /cves directory.

hkong-mitre commented 10 months ago

@ditatompel, @chickpoint, this comment, and this comment from a reviewer makes a lot of sense to me. Do my suggested changes make sense to you as well? Your use cases and comments would be most welcome, so please feel free to comment in those conversations listed.

chickpoint commented 10 months ago

Looks like a workable result to continue. It is still in the same line as the recent_activities.json but simplified. Think that makes the file easier to handle and more readable for others that want to use it.

I really like the idea of having the log as well, as my implementation is based on Docker. So, it would be nice to have a small history to prepopulate the list.

The only preference I would have would be a relative path to the Json, to handle the lookups locally instead of pulling them from GitHub again. But that might just be a personal preference and not useful for everyone.

Anyhow I like the changes and thanks for the great work.

hkong-mitre commented 10 months ago

@chickpoint, that's an interesting point. I wanted to make the delta file available to apps (e.g., browsers pages) that don't have a local repository, but you can make use of the by doing a string split at https://raw.githubusercontent.com/CVEProject/cvelistV5/main/ to get the local repository file path.

hkong-mitre commented 9 months ago

@chickpoint, @ditatompel, just wanted to let you know that the PR for this enhancement was approved 2023-09-26 and has been running as intended since.

You can find the permalinks for the deltas as follows:

You can also find those 2 files in the cves directory on GitHub, and in your clones.