id_duplicates is now up and running, but additional cases can be trained for. There are a few notable duplicates still hanging around, and one way to attend to them, as well as to other duplicates, would be to use the train_new_deduplication_data Jupyter Notebook.
This should be as easy as following the directions, in theory, but practice is the death of theory. Someone who's not me should train additional cases and see if it's easy to understand and do. I say "not me" because I already know how to do it, but that's not useful if no one else can figure it out.
I'm defining this as a good first issue because, ideally, it should be, and if it can't, some changes need to be made.
id_duplicates is now up and running, but additional cases can be trained for. There are a few notable duplicates still hanging around, and one way to attend to them, as well as to other duplicates, would be to use the train_new_deduplication_data Jupyter Notebook.
This should be as easy as following the directions, in theory, but practice is the death of theory. Someone who's not me should train additional cases and see if it's easy to understand and do. I say "not me" because I already know how to do it, but that's not useful if no one else can figure it out.
I'm defining this as a good first issue because, ideally, it should be, and if it can't, some changes need to be made.