Closed Fati-Hei closed 3 years ago
By the example I meant --> quick_start.ipynb.
You mean you ran the quick_start.ipynb
example, but on Norwegian texts? Well the examples of labelling functions given in the jupyter notebook (for instance the company_detector
and the other_org_detector
) are made for the kind of English-language news articles used in this example, so they will most likely not detect anything on Norwegian texts (which means that there won't be anything to aggregate). You need to tailor your labelling functions to the texts you have in your collection.
Yes on Norwegian text with Norwegian labels. This is how I used the example:
`nlp = spacy.load("nb_core_news_lg",disable=["ner", "lemmatizer"]) docs = list(nlp.pipe(df.content.values))
OTHER_ST_WORDS = {"NS","NEK","TEK"} def standards_detector(doc): for chunk in doc.noun_chunks:
if any([token.text in OTHER_ST_WORDS for token in chunk]):
yield chunk.start, chunk.end, "STA"
other_st_detector = skweak.heuristics.FunctionAnnotator("st_detector", standards_detector)
docs = list(other_st_detector.pipe(docs))
skweak.utils.display_entities(docs[8], "st_detector")`
I tried to use skweak.utils.display_entities(docs[12], "hmm", add_tooltip=False) using the example you provided here but it returns KeyError: 'hmm'. What can be the source of this problem? My documents are in Norwegian and I use "nb_core_news_lg" spacy model.