UUDigitalHumanitieslab / AnnCor-scripts

A place for all the AnnCor scripts
MIT License
0 stars 0 forks source link

Repetition, retracing, reformulation correction #38

Open JeltevanBoheemen opened 1 year ago

JeltevanBoheemen commented 1 year ago

In the current parses, the following CHAT annotated constructs are ignored:

The following steps should be implemented and executed:

JanOdijk commented 1 year ago

The metadata can be added by using the function cleantext from the module cleanCHILDEStokens

def cleantext(utt: str, repkeep: bool, tokenoutput: bool = False) -> Tuple[CleanedText, Metadata]:

where utt is a string with CHAT-annotations (to be taken from the existing metadata associated with the parse tree), the output is a tuple consisting of a cleaned text (all CHAT-annotations applied) and a list of Metadata .

For the cases at hand, it will turn the CHAT codes [/] and [//] into appropriate metadata.

There is no need to run this specifically for this set, because metadata for all AnnCor utterances must be generated by using this function