Add util func: virtual text based on ann features

johann-petrak commented 4 years ago

As in the stringannotation plugin but more flexible.

One or more texts, based on split anns, insert sep chars or not, insert if, insert from lambda

johann-petrak commented 4 years ago

This should return the text and the offset mapping. It should be possible to add placeholder text for specific annotation types, so instead of retrieving the text from the feature or underlying document, just add some constant text (e.g. a single space) if the annotation is encountered.

johann-petrak commented 3 years ago

Possible signature: text, offsetmap = Document.virtualtext(...) with parameters:

annset: annotations from that set are used
anndescs: a list of details about annotations to use where each detail is:
- string: type of the annotation, use underlying document text
- dictionary with keys type, feature, text: if feature specified use str(feature). If text is specified use that if feature not specified or None
- the list represents the priority of which annotation type to use, if there are several of the same type, an arbitrary one is used.
- after an annotation is used, the next one is retrieved from the position after the end of the used one
- if the annotation that would get used ends after the within range, it is not used and processing ends
within: span or annotation from any set, only annotations within that span are processed
gap: string to insert between segments without annotation, if None, nothing is inserted

The offset map is just an array[int] with the same length as the returned text, containing the document offset for each text offset.

GateNLP / python-gatenlp

Add util func: virtual text based on ann features #18