curiosity-ai / catalyst

🚀 Catalyst is a C# Natural Language Processing library built for speed. Inspired by spaCy's design, it brings pre-trained models, out-of-the box support for training word and document embeddings, and flexible entity recognition models.
MIT License
699 stars 71 forks source link

How to extract contextual information Names, pre-scription reading etc #67

Closed fasteddys closed 2 years ago

fasteddys commented 2 years ago

Language English

Hello, I tried to modify your samples, but I am not getting meanigful information back from text documents.

For .e.g how can we get name / docotors notes , or medical stuff, address etc.

theolivenbaum commented 2 years ago

Hi @fasteddys - you need to train an entity recognition model in order to extract information like this. Do you have any training data for these entities?

fasteddys commented 2 years ago

hello @theolivenbaum , Yes I have data, but how do I setup or modify the sample please

wenbin97 commented 2 years ago

hello @theolivenbaum , Yes I have data, but how do I setup or modify the sample please

I'm currently using this to train the NER model "TEST" and you may use it as example. However the NER does not seem to learn from the training data, so you might need to modify some part.

        public async Task TrainNlp(List<NerTrainData> trainData)
        {
            var aper = new AveragePerceptronEntityRecognizer(Language.English, 0, "TEST", new string[] { "Person", "Organization", "Location" }, ignoreCase: false);

            var trainingEntities = trainData.SelectMany(t => t.Associations.Select(a => a.Tag)).ToArray();
            //aper.AddEntityTypes(trainingEntities);
            var documents = new List<Document>();

            foreach (var data in trainData)
            {
                var sentence = new Document(data.Paragraph);
                var span = sentence.AddSpan(0, sentence.Length);
                foreach (var tag in data.Associations)
                {
                    var token = span.AddToken(tag.Start, tag.End);
                    token.AddEntityType(new EntityType(tag.Tag, EntityTag.Inside));
                }
                documents.Add(sentence);
            }
            aper.Train(documents);
            await aper.StoreAsync();
        }
fasteddys commented 2 years ago

Hello thanks for your code, but there some issue, I tried it out, the NER does not seem to learn from the training model data

marwahaks commented 1 year ago

@fasteddys Did you manage to solve this query?