Closed rohanchn closed 1 year ago
spacy does have basic Urdu language support so loading the stanza pipeline should work as described in the spacy-stanza
README, just with ur
instead of en
:
nlp = spacy_stanza.load_pipeline("ur")
(If the stanza language doesn't have basic support in spacy, then you can still load the stanza language as described for Coptic in the first item here: https://github.com/explosion/spacy-stanza#stanza-pipeline-options.)
However it doesn't look like stanza currently has an NER model for Urdu, so you'd need to train your own NER model. If you have an annotated NER corpus, you could train a stanza NER model following the stanza docs: https://stanfordnlp.github.io/stanza/new_language_ner.html
Or you could train a spacy NER model (https://spacy.io/usage/training/#quickstart) and add this component to the nlp
pipeline as an additional pipeline component with nlp.add_pipe
instead. The spacy course (https://course.spacy.io/en/chapter4) and example projects (e.g., https://github.com/explosion/projects/tree/v3/pipelines/ner_demo) show how to get started with training custom spacy NER models.
This is very useful. Thank you for this!
Yes, I intend to train my own NER model.
I am closing this issue for now, and in case I can hit a wall, I will write again. Thanks again!
Hi,
I am looking to work on an NER pipeline for Urdu. Currently,
spacy
doesn't support Urdu butstanza
does. I am under the impression that to usespacy-stanza
for a language, both libraries must support the language. But then I also saw #35 where the users seems to be usingspacy-stanza
for Urdu.Could anyone here please provide some wisdom on using
stanza
models inspacy
for languages thatspacy
doesn't support?