Closed fahadshery closed 6 years ago
Great showoff! I briefly skimmed your starting guide and briefly skipped your question. I think there is a problem and everything you indicate points to that issue.
Namely: The merge part
x <- merge(crfsuite_annotation_verbatim_to_annotate, verbatim_tokens)
assumes really that you have the fields start and end in theverbatim_tokens
. You need to useverbatim_tokens <- as.data.frame(verbatim_tokens, detailed = TRUE)
to get this. Or just useudpipe(verbatims, udmodel)
udpipe(verbatims, "english", udpipe_model_repo = "bnosac/udpipe.models.ud")
that will tokenise with a commercially fine model which is downloaded from here: https://github.com/bnosac/udpipe.models.udKeep up the spirit!
This is awesome! I will test your points and report back
all worked well. thank you. I am using the udpipe(verbatims, "english", udpipe_model_repo = "bnosac/udpipe.models.ud")
now for commercial reasons. working well now
Hi again, I am back :)
I have finally created my own model using the example provided in the
docs
. This worked perfectly and no issues at all. However, when I use yourRDRPOSTagger
package to do tokenisation, I ran into weird behavior. The complete re-producable example is here.Here are the issues/info:
x <- crf_cbind_attributes(x1, terms = c("upos", "lemma"), by = "doc_id")
creates68 cols
in total when executed onUDPipe POS
andtokenised
dataframe whereas if you do the same forRDRPOSTagger
POS
andtokenised
dataframe, it creates only36 cols
.chunk_entity col
is not prefixed byI, B
orO
as it does forUDPipe
tokenised
andPOS dataframe
duplicates
forRDRPOSTagger tokenised dataframe
but it doesn'tduplicate
if theUDPipe model
is used.merge(crfsuite_annotation_verbatim_to_annotate,rdr_tagging)
method (in docs, you pass the annotated object first and then the y object) it throughs an errorError in merge.chunkrange(crfsuite_annotation_verbatim_to_annotate, rdr_tagging) : all(c(by.y, "start", "end") %in% colnames(y)) is not TRUE
. But if you look atUDPipe tokenised dataframe
. This also doesn'thave start
andend cols
. I fixed it by changing the position of the method call by:x <- merge(rdr_tagging,crfsuite_annotation_verbatim_to_annotate)
I wrote a complete n00b
getting starting guide
here.thanks