Open Shotgunosine opened 9 years ago
That's a pretty strange one.
So right now we are using CorporationNameOrganization
for things like
This is not really what's going on with A Deleware Limited Liability Company
I would say that that is not really part of the name at all.
I'm trying to improve performance of the parser on a fairly messy list containing individuals, households, and corporations. For individuals and households the parser works great. For corporations I see lots of listings like: Acme LLC, A Delaware Limited Liability Company
Currently the tagging for that will be:
I think ideally the result would be something like:
In addition to adding "Article" and "Location" labels, I was thinking I would add edit distance to a state name as a feature.
My question is about how much training data I should use. Is it purely a situation where more examples will be better? Or should I add a few core examples and then augment those with problem cases as they come up?