Closed ruanchaves closed 2 years ago
Hi, some of the data/model files used in this walkthrough are too big to be put on the GitHub repository (as mentioned at the top of the Jupyter notebook), but are accessible for download at https://github.com/NorskRegnesentral/skweak/releases/tag/0.2.8
Several files that are used on the step-by-step NER tutorial are missing from the
data
folder ( this folder on the master branch ), so it's currently not possible to execute all steps in the tutorial.Some examples:
The tutorial uses a spaCy ConLL 2003 annotator, but the folder
../../data/conll2003/
does not exist in this repository.annotator = skweak.spacy.ModelAnnotator("conll2003", "../../data/conll2003/")
Similarly, the paths
../../data/wikidata_tokenised.json
,../../data/crunchbase.json
are referenced in the tutorial but they also do not exist in this repository.The file
conll2003_ner.py
, which is imported in the tutorial, also makes reference to missing files. Some examples:FORM_FREQUENCIES = os.path.dirname(__file__) + "/../../data/form_frequencies.json"
self.add_annotator(ModelAnnotator("BTC", os.path.dirname(__file__) + "/../../data/btc"))
None of these paths exist in this repository.