Open colarusso opened 2 years ago
Considerations: I'm thinking maybe the regex should only be for very clear cases (e.g., Name = user1_name) and that ambiguous fields should be handled by the ML classifier which should catch them before the rough normalize knocks out things like "name." My worry is that just searching for fields with Name and making them a person will be overly broad and have a hard time determining whose name something should be. The ML clf should be better if it has sufficient training data. I'm going to let this sit for a while and get a better feel for what the right balance is.