clamsproject / mmif-summarizer

Apache License 2.0
0 stars 0 forks source link

Wrong startTime and endTime for some subjects #2

Closed marcverhagen closed 3 years ago

marcverhagen commented 3 years ago

Named entities refer to a text document and those are then tracked to start start and end times of the time frame corresponding to the document. This works well if the text document comes from Tesseract and is aligned to a EAST bounding box that has a time point. But when the document is aligned to a time frame from Kaldi, then the time frame may span all document or at least one speech segment, and the start and end times will be too wide.

Should connect the entity to the tokens and then to the time frame.