SuffolkLITLab / form-explorer

A set of tools for exploring the connections between blank and historic court forms.
https://suffolklitlab.org/form-explorer/
2 stars 0 forks source link

Incorporate context around these name parts #4

Open colarusso opened 2 years ago

colarusso commented 2 years ago

Take a look at https://github.com/SuffolkLITLab/docassemble-ALWeaver/blob/2f0e9dd4d02a981e9a925176fc206fd29c0e7309/docassemble/ALWeaver/interview_generator.py#L1185 and https://github.com/SuffolkLITLab/docassemble-ALWeaver/blob/2f0e9dd4d02a981e9a925176fc206fd29c0e7309/docassemble/ALWeaver/generator_constants.py#L146 to see the rules for deciding if a variable represents a person's name. a few of the variables your normalizer caught have Name in them already, which probably means they should be turned into people variables that the Weaver can recognize. As it is it looks like the normalizer is dropping Name from the normalized field label because it thinks it has lower semantic value.

colarusso commented 2 years ago

Considerations: I'm thinking maybe the regex should only be for very clear cases (e.g., Name = user1_name) and that ambiguous fields should be handled by the ML classifier which should catch them before the rough normalize knocks out things like "name." My worry is that just searching for fields with Name and making them a person will be overly broad and have a hard time determining whose name something should be. The ML clf should be better if it has sufficient training data. I'm going to let this sit for a while and get a better feel for what the right balance is.