AlexPoint / OpenNlp

Open source NLP tools (sentence splitter, tokenizer, chunker, coref, NER, parse trees, etc.) in C#
MIT License
283 stars 101 forks source link

How to generate a Tag Dictionnary? #24

Closed NeomMob closed 5 years ago

NeomMob commented 5 years ago

I am using the following code for training a POS model. The question is then how to generate the tag dictionnary that is required later to use the model?

        var trainingFile = "..";
        // The number of iterations; no general rule for finding the best value, just try several!
        var iterations = 5;
        // The cut; no general rule for finding the best value, just try several!
        var cut = 2;
        // Train the model (can take some time depending on your training file size)
        var model = MaximumEntropyPosTagger.TrainModel(trainingFile, iterations, cut); 
        // Persist the model to use it later
        var outputFilePath = @"...";
        new BinaryGisModelWriter().Persist(model, outputFilePath);
AlexPoint commented 5 years ago

When you create a new object MaximumEntropyPosTagger, you can pass as an argument a PosLookupList which is your tag dictionary. If you don't, it defaults to DefaultPosContextGenerator. Now this tag dictionary and the GisModel are two distinct objects so you need to persist them both in different files if you don't use the default one. Does that answer your question?

NeomMob commented 5 years ago

Not tried yet but it seems to answer to all of my questions. Thanks for your support!