Another questions about training data

Hi. Thank you for your amazing project. I'm trying to retrain NER model and want to understand a couple of moments which are not clear for me: 1 .I'm curious about the size of named entity gazetteers and about the possibility of the expanding this data. In the paper you mentioned that named entity gazetteers were collected from DBPedia. But could you specify the way how did you collect this data? And the size of this data?

Am I right, that you use only OntoNotes for training NER (except lexica of course)?
Here, you use files like "known_corporations.txt", "known_countries.txt", "known_currencies.txt", etc. Could you point me where is this data from?

Sorry, if GitHub is not the best place for my questions, but I hope your answers could help others as well.

emorynlp / nlp4j

Another questions about training data #31