HughCraig / GHAP

2 stars 1 forks source link

Highlighting of placename misaligned #454

Open BillPascoe opened 3 days ago

BillPascoe commented 3 days ago

I uploaded and parsed Petries Reminiscences (from Gutenberg) and created a layer. The resulting text map is highlighting placenames wrongly. The first occurrences are correct. The first occurence of 'Brisbane' do not have the 'B' highlighted. Then some are out by 4 characters, and lower down in the document they are completely wrong. The link does link to the correctly place on the map. So it appears the parsing and geolocating etc is correct and the only problem is with highlighting the text - so it is probably an issue with the line and character numbers. It is cumulative, as if there is a recurring glitch that puts out the indexing a character each time it happens.

https://test-views.tlcmap.org/dev/textmap.html?load=https%3A%2F%2Ftest-ghap.tlcmap.org%2Flayers%2F1421%2Fjson%3Ftextmap%3Dtrue

PetriesReminiscences.txt

IanMcCrabb commented 3 days ago

@MufengNiu figure we should address this bug as soon as we can.

MufengNiu commented 1 day ago

The highlighting from TLCMap is working correctly. However, the geoparsing from the text map doesn’t seem to return the correct indexes when the content contains some special characters.

For example, in this text: Example

The ellipsis (...) is being counted as four characters instead of three by the geoparser. This results in incorrect indexing for the places that follow it.

Below an example from the geoparsing response:

The offset (start index) for "Australia" should be 75, and the sentence_start_index should be 57. Because of this issue, the place names are not marked with the correct indexes, leading to alignment problems.

{
   "status":"success",
   "data":{
      "type":"ExtractionResults",
      "place_names":[
         {
            "type":"PlaceName",
            "name":"sydney",
            "text_position":{
               "line":0,
               "word":7,
               "offset":38,
               "sentence_start_index":0,
               "sentence_end_index":57
            },
            "context":"and sincerely trust that they will be sydney reprinted..."
         },
         {
            "type":"PlaceName",
            "name":"australia",
            "text_position":{
               "line":0,
               "word":11,
               "offset":76,
               "sentence_start_index":58,
               "sentence_end_index":128
            },
            "context":"The aborigines of Australia are fast dying out.sydney sydney australia"
         },
         {
            "type":"PlaceName",
            "name":"sydney",
            "text_position":{
               "line":0,
               "word":16,
               "offset":105,
               "sentence_start_index":58,
               "sentence_end_index":128
            },
            "context":"The aborigines of Australia are fast dying out.sydney sydney australia"
         },
         {
            "type":"PlaceName",
            "name":"sydney",
            "text_position":{
               "line":0,
               "word":16,
               "offset":112,
               "sentence_start_index":58,
               "sentence_end_index":128
            },
            "context":"The aborigines of Australia are fast dying out.sydney sydney australia"
         },
         {
            "type":"PlaceName",
            "name":"australia",
            "text_position":{
               "line":0,
               "word":11,
               "offset":119,
               "sentence_start_index":58,
               "sentence_end_index":128
            },
            "context":"The aborigines of Australia are fast dying out.sydney sydney australia"
         }
      ]
   }
}