SciSharp / CherubNLP

Natural Language Processing in .NET Core
Apache License 2.0
114 stars 32 forks source link

System.NullReferenceException during tokenization #1

Open sdg002 opened 5 years ago

sdg002 commented 5 years ago

Hi All, I am trying to get some very basic tokenization to work. I think I am not using the API properly because the method Tokenize is throwing System.NullReferenceException. Any suggestions?

My code

using CH = global::CherubNLP.Tokenize;

public string[] MyTokenize(string sentence)
{
           var options = new CH.TokenizationOptions
            {

            };
            var tokenizer = new CH.TokenizerFactory(
                                    options, 
                                    global::CherubNLP.SupportedLanguage.English);
            var tokens = tokenizer.Tokenize(sentence);
            string[] results = tokens.
                                    Where(tk=>tk.IsAlpha==true).
                                    Select(tk => tk.Text).ToArray();
            return results;
}

Thank you, Sau

sdg002 commented 5 years ago

Ok. I figured this out myself. I should be calling the GetTokenizer method of the factory and then invoke the method Tokenize.

            var options = new CH.TokenizationOptions
            {

            };
            var factory = new CH.TokenizerFactory(
                                    options, 
                                    global::CherubNLP.SupportedLanguage.English);
            var tokenizer = factory.GetTokenizer<CH.TreebankTokenizer>();
            var tokens = tokenizer.Tokenize(sentence,options);
            string[] results = tokens.
                                    Where(tk=>tk.IsAlpha==true).
                                    Select(tk => tk.Text).ToArray();
            return results;