Building of a look-up dictionary for fast CVE matching is delayed until CPEs that are found in CC dataset are known. Only then we build that look-up dict, and we limit ourselves to CPEs contained in the CCDataset, drastically reducing the size of the look-up dict.
Note that, however, to build the look-up dict, we still need to load the full NVD cpe matching dictionary into memory. Would we be able to filter that as well, it would save more memory.
Right now, in peak, build of CVEDataset requires ~6GB of memory on its own, 4GB is consumed by the NVD cpe dictionary that is released from memory as soon as the dataset is built.
Building of a look-up dictionary for fast CVE matching is delayed until CPEs that are found in CC dataset are known. Only then we build that look-up dict, and we limit ourselves to CPEs contained in the CCDataset, drastically reducing the size of the look-up dict.
Note that, however, to build the look-up dict, we still need to load the full NVD cpe matching dictionary into memory. Would we be able to filter that as well, it would save more memory.
Right now, in peak, build of CVEDataset requires ~6GB of memory on its own, 4GB is consumed by the NVD cpe dictionary that is released from memory as soon as the dataset is built.