Closed hjoseph96 closed 2 years ago
Hey @hjoseph96, I'm not sure how well it'll work on short phrases (rather than complete sentences) or this specific use case, but for training, you'll want a training instance for each segment.
segment = "- 1 tablespoon minced garlic, (10g)"
# clean up data
segment = segment.delete_prefix("- ")
# tokenize
tokens = Mitie.tokenize(segment) # ["1", "tablespoon", "minced", "garlic", ",", "(", "10g", ")"]
# add entities
instance = Mitie::NERTrainingInstance.new(tokens)
instance.add_entity(1..1, "Measurement Unit") # tablespoon
instance.add_entity(2..3, "Ingredient") # minced garlic
trainer.add(instance)
So, I'm glad I found this gem. Most NLP gems for Ruby seem to be many years out of date.
I got it up and running rather easily, but I am having some trouble that you may be ale to point em int he right direction about:
rake task to train the model:
"Soy Mustard Salmon" is actually the string name of of the ingredients in the CSV -- I expected it to say it was an Ingredient...but the generated model seems to score everything as a Measurement Unit -- despite it being a much smaller dataset in the instance.
I'm also noticing some portions where seems to match correctly, but it gives me the WHOLE string in the
doc.entities
data instead of the matching portion.Example:
I will say that the ease of use was great -- I'm wondering if there's anything I can do to better train the model. More data?