This PR aims to enhance cert_id heuristics, see #250 for some overview.
[x] Filter candidate cert_ids by certificate scheme.
[x] Extract candidate cert_ids from report PDF metadata.
[x] Extract candidate cert_ids from report PDF filename.
[x] Fix unicode hyphens in Korean cert_id rules.
[x] Choose the cert_id using all of the candidate cert_ids and some weights instead of doing a first-come-first-serve approach as is done currently.
Won't do:
[x] Filter candidate cert_ids by validity year (valid_from should match year in cert_id). Reason: the year of the start of the validity period often does not match the cert_id year.
[x] Add frontpage matching to Korean certificates. Reason: Other heuristics help more.
[x] Add frontpage matching to Japanese certificates. Reson: Many frontpage variants and other heuristics help more.
This PR aims to enhance cert_id heuristics, see #250 for some overview.
Won't do:
valid_from
should match year in cert_id). Reason: the year of the start of the validity period often does not match the cert_id year.