Closed adulau closed 3 months ago
Users do silly things in formatting strings; this is a problem that has been solved, e.g., for phone numbers. If we want to keep flexibility, we could normalize the IDs before indexing and before searching from the index.
Indeed. Yep, I'm looking at the production logs of cve-search and vulnerability-lookup. There are a lot of inputs from different places that make no sense at all. Normalising vulnerability ID strings would make sense but we would need to do it for all the IDs that have as a source. Maybe a small library would make sense in the long run.
Users do silly things in formatting strings; this is a problem that has been solved, e.g., for phone numbers. If we want to keep flexibility, we could normalize the IDs before indexing and before searching from the index.
Yes, It is good to check that the format of the IDs, per sources, is correct. For the moment I think that the workers do not change anything. The data is stored as it is. And the IDs are lowercased.
Just as a note and for information... I started to add JSON schema validation for the various endpoints of the API. For the comments, the bundles and of course the vulnerabilities that we can create for a local instance of vulnerability-lookup. But then, we have noticed (you should have noticed too, since long time I guess...) that it is not rare to see vulnerability advisories (from CVE v5, GSD, etc.) that do not respect their own schema. So we can not be too strict with the validation of the data. For example now when an admin user of a vulnerability-lookup instance creates a vulnerability advisory via the Vulnogram editor, the backend JSON validation is skipped. There is only the frontend validation (not blocking for the user).
(normally my commit closes this ticket. let's see ;-)
You are right, for the import it's a different story. For the user interface (UI and API), normalising makes sense.
I rectify, it seems some users put "space" in the ID and then the lookup doesn't work.