crocs-muni / sec-certs

Tool for analysis of security certificates and their security targets (Common Criteria, NIST FIPS140-2...).
https://sec-certs.org
MIT License
12 stars 8 forks source link

Scrape FIPS algorithm data #276

Open J08nY opened 1 year ago

J08nY commented 1 year ago

Initial description by @J08nY

Data from the FIPS algorithm dataset is not utilized and mined fully. We can follow the links to the algorithm page and get more data that will help us. This can help in cert id cleanup to get rid of the algo references.

Details

Currently, the FIPSAlgorithm object is built from rows of a pandas DataFrame constructed merely from the list of Algorithms, see below

https://github.com/crocs-muni/sec-certs/blob/f41d077185f7e40d1a524bbfc9c4a11dbd312f73/src/sec_certs/dataset/fips_algorithm.py#L98

This table does not include valuable attributes found on the individual pages of the algorithm. The proposed enhancement should:

https://github.com/crocs-muni/sec-certs/blob/f41d077185f7e40d1a524bbfc9c4a11dbd312f73/src/sec_certs/sample/fips_algorithm.py#L13

Further guidance

One can isolate the pipeline stage that processes the algorithm dataset simply by

from sec_certs.dataset.fips_algorithm import FIPSAlgorithmDataset

alg_dset = FIPSAlgorithmDataset.from_web()
alg_dset.to_json("/path/to/some/file.json")

The PR implementing this enhancement should modify the parse_algorithms_from_html method.

adamjanovsky commented 1 year ago

Just FYI, the current state of #275 is that I've refactored building of a dataset of FIPS Algorithms. The certificate only store strings of the algorithm identifiers and are nowhere connected to the respecitve objects. So once we improve algorithm scraping, we could connect these two datasets as well.

adamjanovsky commented 5 months ago

@Julik24 this is the task that we've discussed today. Before attempting to contribute, please be sure to go through https://sec-certs.org/docs/contributing.html, especially the Quality Assurance section. The typical development workflow is described at https://docs.github.com/en/get-started/using-github/github-flow.

Please, assign yourself to the issue once you accept the invitation.