amazon-science / ReFinED

ReFinED is an efficient and accurate entity linking (EL) system.
Other
196 stars 40 forks source link

Updated load_pem() to output consistent Dict[List] #17

Closed shern2 closed 1 year ago

shern2 commented 1 year ago

if max_cands is not specified, load_pem() will output Dict[Dict] instead of Dict[List], causing the pem.lmdb built to have unexpected type by the candidate generator (get_candidates())

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

shern2 commented 1 year ago

Found that this will mess up build_entity_index() where the values of pem is assumed to be a Dict instead of a List[Tuple]. https://github.com/amazon-science/ReFinED/blob/main/src/refined/offline_data_generation/preprocess_all.py#L112

Retracting this PR..