Open J08nY opened 1 year ago
Just FYI, the current state of #275 is that I've refactored building of a dataset of FIPS Algorithms. The certificate only store strings of the algorithm identifiers and are nowhere connected to the respecitve objects. So once we improve algorithm scraping, we could connect these two datasets as well.
@Julik24 this is the task that we've discussed today. Before attempting to contribute, please be sure to go through https://sec-certs.org/docs/contributing.html, especially the Quality Assurance
section. The typical development workflow is described at https://docs.github.com/en/get-started/using-github/github-flow.
Please, assign yourself to the issue once you accept the invitation.
Initial description by @J08nY
Details
Currently, the
FIPSAlgorithm
object is built from rows of a pandas DataFrame constructed merely from the list of Algorithms, see belowhttps://github.com/crocs-muni/sec-certs/blob/f41d077185f7e40d1a524bbfc9c4a11dbd312f73/src/sec_certs/dataset/fips_algorithm.py#L98
This table does not include valuable attributes found on the individual pages of the algorithm. The proposed enhancement should:
FIPSAlgorithm
object (see below) should be enriched with the attributes mentioned above.https://github.com/crocs-muni/sec-certs/blob/f41d077185f7e40d1a524bbfc9c4a11dbd312f73/src/sec_certs/sample/fips_algorithm.py#L13
Further guidance
One can isolate the pipeline stage that processes the algorithm dataset simply by
The PR implementing this enhancement should modify the parse_algorithms_from_html method.