Closed senisioi closed 6 years ago
Hi, some input examples are in folder data
. In particular, each line the input training corpus (i.e. corpusFilePath
) is a sequence of WORD/TAG pairs separated by white space characters. Parameter fullLexicon
is used to specify either a full lexicon output which contains all word types or a smaller lexicon output which excludes word types appearing only 1 time in the input training corpus. E.g:
createLexicon("../data/goldTrain", 'full')
createLexicon("../data/goldTrain", 'short')
Could you please add a few lines on how to use the
LexiconCreator.py
script? I am not sure what parameters to set to the functioncreateLexicon
. IscorpusFilePath
the path to the universal dependencies file? What aboutfullLexicon
?