Open lucadaniello opened 5 months ago
Sentence splitting is based on a statistical classification model trained on conllu data from universaldependencies. It predicts for each letter in the text if a new sentence starts at that letter given the surrounding context. If you want to use another way of splitting, you could use udpipe::strsplit.data.frame or strsplit from base R in order to define your own hardcoded sentence splitting criteria.
Hi! I would like to ask you something about the splitting of text into sentences during the annotation phase.
I thought that the sentences were split by considering dots at the end of them, but it is not always the case. Sometimes sentence separators are ":" or a term in uppercase.
I would like to ask:
I’m using the udpipe package in R. Below is an example text where I find that sentences are separated by an uppercase term:
model <- udpipe_download_model(language = "english") txt <- c("No previous study has investigated the influence of governance and organizational AHCs configurations on the productivity and scientific impact of AHCs.") df <- udpipe(txt, object = udpipe_load_model(model$file_model))
Thank you!!