Closed J08nY closed 11 months ago
Here is some manually extracted data from a full CC run on the server. Commit: 6448911bb5872feb281b0151d63c54eeeb887cc7 Total duration: 9h:36m:15s
What | When | Length |
---|---|---|
Initial CSV/HTML download + process | 2023-04-26 13:43:03,860 | 0h0m |
CPEDataset from JSON | 2023-04-26 13:43:43,700 | 0h1m |
CVEDataset from JSON | 2023-04-26 13:44:01,253 | 0h0m |
PPDataset | 2023-04-26 13:44:32,354 | 0h0m |
MU dataset - download reports | 2023-04-26 13:44:32,637 | 0h3m |
MU dataset - download targets | 2023-04-26 13:47:36,796 | 0h4m |
MU dataset - convert reports | 2023-04-26 13:51:33,456 | 0h5m |
MU dataset - convert targets | 2023-04-26 13:56:41,531 | 0h11m |
MU dataset - extract report meta | 2023-04-26 14:07:21,226 | 0h0m |
MU dataset - extract target meta | 2023-04-26 14:07:23,571 | 0h0m |
MU dataset - extract report frontpage | 2023-04-26 14:07:54,051 | 0h0m |
MU dataset - extract target frontpage | 2023-04-26 14:07:56,402 | 0h0m |
MU dataset - extract report keywords | 2023-04-26 14:08:03,717 | 0h0m |
MU dataset - extract target keywords | 2023-04-26 14:08:29,043 | 0h6m |
CC scheme pages | 2023-04-26 14:14:40,720 | 0h15m |
download reports | 2023-04-26 14:29:18,751 | 0h33m |
download targets | 2023-04-26 15:02:33,414 | 0h38m |
convert reports | 2023-04-26 15:40:24,238 | 2h27m |
convert targets | 2023-04-26 18:07:29,028 | 3h9m |
extract report meta | 2023-04-26 21:18:53,521 | 0h3m |
extract target meta | 2023-04-26 21:21:46,177 | 0h7m |
extract report frontpage | 2023-04-26 21:28:41,745 | 0m1m |
extract target frontpage | 2023-04-26 21:29:45,351 | 0h2m |
extract report keywords | 2023-04-26 21:31:30,754 | 0h16m |
extract target keywords | 2023-04-26 21:47:04,355 | 1h1m |
heuristics - cert_id | 2023-04-26 22:48:02,540 | 0h0m |
heuristics - cpe match | 2023-04-26 22:48:02,729 | 0h6m |
heuristics - cve | 2023-04-26 22:54:21,816 | 0h2m |
heuristics - references | 2023-04-26 22:56:16,638 | 0h0m |
heuristics - transitive vulns | 2023-04-26 22:56:18,026 | 0h0m |
heuristics - cert labs | 2023-04-26 22:56:38,557 | 0h0m |
heuristics - SARs | 2023-04-26 22:56:38,622 | 0h23m |
End | 2023-04-26 23:19:19,853 |
Some numbers:
The resulting dataset has 5326 certificates.
In total, we identified 22546 vulnerabilities in 367 vulnerable certificates.
There were total of 151 certificates skipped due to duplicity
The biggest culprits in the runtime are the OCR in our pdf to text conversion and the download from CC pages.
From https://github.com/crocs-muni/sec-certs/pull/275#discussion_r1038820309:
Log entries like this could be replaced with some elegant way of tracking how long these stages and steps of processing take. Like a context manager that: