crocs-muni / sec-certs

Tool for analysis of security certificates and their security targets (Common Criteria, NIST FIPS140-2...).
https://sec-certs.org
MIT License
9 stars 7 forks source link

Switch from NVD json feeds to API #328

Closed adamjanovsky closed 1 year ago

adamjanovsky commented 1 year ago

Closes #324

TODO

Also, it may be valuable to put up a list of expected CVEs and there matches. Maybe we could collect it on Trello. I don't think that we want to run these tests on each commit (so I'll disable them in CI/CD), but it may be good idea to run them when touching CVE/CPE matching.

Endpoints to use:

New tests

Notes:

cc_dset = CCDataset(root_dir="/Users/adam/phd/projects/certificates/sec-certs/datasets/cc")
cc_dset.get_certs_from_web()
cc_dset._prepare_cpe_dataset()
cc_dset._prepare_cve_dataset()
cc_dset._prepare_cpe_match_dict()
cc_dset.compute_cpe_heuristics()
cc_dset.compute_related_cves()
codecov[bot] commented 1 year ago

Codecov Report

Patch coverage: 74.42% and project coverage change: +0.84 :tada:

Comparison is base (5893352) 76.61% compared to head (d4825d1) 77.44%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #328 +/- ## ========================================== + Coverage 76.61% 77.44% +0.84% ========================================== Files 51 52 +1 Lines 6372 6572 +200 ========================================== + Hits 4881 5089 +208 + Misses 1491 1483 -8 ``` | [Impacted Files](https://codecov.io/gh/crocs-muni/sec-certs/pull/328?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=crocs-muni) | Coverage Δ | | |---|---|---| | [src/sec\_certs/sample/fips.py](https://codecov.io/gh/crocs-muni/sec-certs/pull/328?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=crocs-muni#diff-c3JjL3NlY19jZXJ0cy9zYW1wbGUvZmlwcy5weQ==) | `86.34% <0.00%> (-0.27%)` | :arrow_down: | | [src/sec\_certs/utils/pandas.py](https://codecov.io/gh/crocs-muni/sec-certs/pull/328?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=crocs-muni#diff-c3JjL3NlY19jZXJ0cy91dGlscy9wYW5kYXMucHk=) | `0.00% <ø> (ø)` | | | [src/sec\_certs/dataset/dataset.py](https://codecov.io/gh/crocs-muni/sec-certs/pull/328?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=crocs-muni#diff-c3JjL3NlY19jZXJ0cy9kYXRhc2V0L2RhdGFzZXQucHk=) | `52.22% <21.43%> (-9.34%)` | :arrow_down: | | [src/sec\_certs/dataset/cpe.py](https://codecov.io/gh/crocs-muni/sec-certs/pull/328?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=crocs-muni#diff-c3JjL3NlY19jZXJ0cy9kYXRhc2V0L2NwZS5weQ==) | `73.98% <64.11%> (+18.71%)` | :arrow_up: | | [src/sec\_certs/utils/nvd\_dataset\_builder.py](https://codecov.io/gh/crocs-muni/sec-certs/pull/328?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=crocs-muni#diff-c3JjL3NlY19jZXJ0cy91dGlscy9udmRfZGF0YXNldF9idWlsZGVyLnB5) | `82.68% <82.68%> (ø)` | | | [src/sec\_certs/sample/cve.py](https://codecov.io/gh/crocs-muni/sec-certs/pull/328?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=crocs-muni#diff-c3JjL3NlY19jZXJ0cy9zYW1wbGUvY3ZlLnB5) | `84.04% <85.30%> (+32.25%)` | :arrow_up: | | [src/sec\_certs/dataset/cve.py](https://codecov.io/gh/crocs-muni/sec-certs/pull/328?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=crocs-muni#diff-c3JjL3NlY19jZXJ0cy9kYXRhc2V0L2N2ZS5weQ==) | `91.09% <85.49%> (+6.58%)` | :arrow_up: | | [src/sec\_certs/sample/cpe.py](https://codecov.io/gh/crocs-muni/sec-certs/pull/328?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=crocs-muni#diff-c3JjL3NlY19jZXJ0cy9zYW1wbGUvY3BlLnB5) | `91.51% <89.84%> (-1.08%)` | :arrow_down: | | [src/sec\_certs/serialization/json.py](https://codecov.io/gh/crocs-muni/sec-certs/pull/328?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=crocs-muni#diff-c3JjL3NlY19jZXJ0cy9zZXJpYWxpemF0aW9uL2pzb24ucHk=) | `84.91% <90.48%> (+0.70%)` | :arrow_up: | | [src/sec\_certs/configuration.py](https://codecov.io/gh/crocs-muni/sec-certs/pull/328?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=crocs-muni#diff-c3JjL3NlY19jZXJ0cy9jb25maWd1cmF0aW9uLnB5) | `92.46% <100.00%> (+0.79%)` | :arrow_up: | | ... and [6 more](https://codecov.io/gh/crocs-muni/sec-certs/pull/328?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=crocs-muni) | | ... and [6 files with indirect coverage changes](https://codecov.io/gh/crocs-muni/sec-certs/pull/328/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=crocs-muni) Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=crocs-muni). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=crocs-muni)

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

adamjanovsky commented 1 year ago

@J08nY Could you pls expose CPEDataset,CVEDataset` and json of CPE Match feed somewhere on seccerts.org in compressed form?

The URLs are in the settings: https://github.com/crocs-muni/sec-certs/blob/fc638a859741a6cd8096c23789fc8ddff5236272/src/sec_certs/configuration.py#L72-L80, feel free to change them as you find fitting.

The CPEDataset and CVEDataset instances can be compressed with to_json(compress=True). CPEMatch feed is just a json, so it has to be handled separately.

Basically, now we just have to decide the URLs. Can you do that and change settings keys accordingly?

J08nY commented 1 year ago

Where do I get the CPE match feed? What processing do I need to do to obtain it?

adamjanovsky commented 1 year ago

Where do I get the CPE match feed? What processing do I need to do to obtain it?

If you have processed dataset available, the json should sit in auxiliary_datasets directory. Otherwise, you can obtain it with _prepare_cpe_match_dict(): https://github.com/crocs-muni/sec-certs/blob/d3d470ed408fde3638d26b49a7f6a403fe57c7e9/src/sec_certs/dataset/dataset.py#L400

You can either copy the contents of the method, or just create new dataset at some path and call the method right away. E.g.,

from sec_certs.dataset import CCDataset
cc_dset = CCDataset(root_dir="/whatever/path")
cpe_match_dict = cc_dset._prepare_cpe_match_dict()

with gzip.open("/path/to/store/cpe_match_dict.json", "w") as handle:
    json_str = json.dumps(cpe_match_dict, indent=4)
    handle.write(json_str.encode("utf-8"))

To get the datasets from NVD, you need to obtain the NVD API key and set the following two keys in your yaml settings:

nvd_api_key: <actual-api-key>
preferred_source_nvd_datasets: "api"
adamjanovsky commented 1 year ago

@J08nY

Regarding import time optimization, this post has a nice summary of different approachis that you can use to adress this: https://adamj.eu/tech/2023/03/02/django-profile-and-improve-import-time/

I did some profiling. As of now:

(venv) ~/phd/projects/certificates/sec-certs  $ time python -c 'import sec_certs.dataset'
python -c 'import sec_certs.dataset'  3.28s user 0.54s system 111% cpu 3.413 total
(venv) ~/phd/projects/certificates/sec-certs  $ time python -c 'import sec_certs.sample' 
python -c 'import sec_certs.sample'  1.79s user 0.34s system 125% cpu 1.700 total
(venv) ~/phd/projects/certificates/sec-certs  $ time python -c 'import sec_certs.model' 
python -c 'import sec_certs.model'  3.38s user 0.53s system 111% cpu 3.493 total
(venv) ~/phd/projects/certificates/sec-certs  $ time python -c 'import sec_certs.utils'
python -c 'import sec_certs.utils'  0.03s user 0.01s system 93% cpu 0.041 total
(venv) ~/phd/projects/certificates/sec-certs  $ time python -c 'import sec_certs'      
python -c 'import sec_certs'  0.03s user 0.01s system 93% cpu 0.043 total

I deferred few imports, see: 88f4630

Profiling after:

(venv) ~/phd/projects/certificates/sec-certs  $ time python -c 'import sec_certs.datas
et'
python -c 'import sec_certs.dataset'  1.48s user 0.28s system 131% cpu 1.343 total
(venv) ~/phd/projects/certificates/sec-certs  $ time python -c 'import sec_certs.sample'
python -c 'import sec_certs.sample'  1.47s user 0.29s system 131% cpu 1.336 total
(venv) ~/phd/projects/certificates/sec-certs  $ time python -c 'import sec_certs.model'
python -c 'import sec_certs.model'  1.50s user 0.29s system 131% cpu 1.365 total
(venv) ~/phd/projects/certificates/sec-certs  $ time python -c 'import sec_certs.utils'
python -c 'import sec_certs.utils'  0.03s user 0.01s system 92% cpu 0.044 total
(venv) ~/phd/projects/certificates/sec-certs  $ time python  -c 'import sec_certs'    
python -c 'import sec_certs'  0.03s user 0.01s system 93% cpu 0.041 total

So, from 3.3 seconds we go to 1.5. Any further reduction would require:

I did the profiling with python -X importtime yourfile.py 2> import.log and https://pypi.org/project/tuna/.

I consider this to be an OK result and I will invest no more effort into this unless you promote the issue.

Edit: Also note that the imports called from functions should be called only once AFAIK.