Living-with-machines / T-Res

A Toponym Resolution Pipeline for Digitised Historical Newspapers
Other
7 stars 1 forks source link

Bug: Error received when run_sentence is called on an empty string #171

Open lukehare opened 1 year ago

lukehare commented 1 year ago

If you call the run_sentence function on an empty string ("") or whitespace (" ") you receive the following error:

>>> geoparser.run_sentence(" ")

UnboundLocalError                         Traceback (most recent call last)
Cell In [7], line 1
----> 1 geoparser.run_sentence(" ")

File ~/toponym-resolution/geoparser/pipeline.py:141, in Pipeline.run_sentence(self, sentence, sent_idx, context, place, place_wqid)
    139 sentence = sentence.replace("—", ";")
    140 # Get predictions:
--> 141 predictions = self.myner.ner_predict(sentence)
    142 # Process predictions:
    143 procpreds = [
    144     [x["word"], x["entity"], "O", x["start"], x["end"], x["score"]]
    145     for x in predictions
    146 ]

File ~/toponym-resolution/geoparser/recogniser.py:256, in Recogniser.ner_predict(self, sentence)
    254         print("Token processing error.")
    255     predictions = ner.aggregate_entities(pred_ent, lEntities)
--> 256 predictions = ner.fix_hyphens(predictions)
    257 predictions = ner.fix_nested(predictions)
    258 predictions = ner.fix_startEntity(predictions)

UnboundLocalError: local variable 'predictions' referenced before assignment

The following geoparser configuration was used, but the issue may affect other configurations too:

mylinker = linking.Linker(
    method="reldisamb",
    resources_path="../resources/wikidata/",
    linking_resources=dict(),
    base_model="../resources/models/bert/bert_1760_1900/",  # Base model for vector extraction
    rel_params={
        "base_path": "../resources/rel_db/",
        "wiki_version": "wiki_2019/",
        "training_data": "lwm",  # lwm, aida
        "ranking": "publ",
        "micro_locs": "dist" # "dist", "nil", or ""
    },
    overwrite_training=False,
)

geoparser = pipeline.Pipeline(mylinker=mylinker)
fedenanni commented 1 year ago

eheheh - good one! I'll fix as soon as possible

fedenanni commented 1 year ago

Interesting, I get this error on a white-space but not on an empty string ("") and not even with only punctuation (".").

fedenanni commented 1 year ago

Ignore, my mistake.

fedenanni commented 1 year ago

Should be fixed now in https://github.com/Living-with-machines/toponym-resolution/commit/7ea23dd61c9623a93fa7558cce0b5fa8e7396819 which is included as part of #169