It should be human-in-the-loop (we present the user with a list of choices and they get to say "yes, Donald E. Knuth and Donald Ervin Knuth are the same" so that we don't make mistakes that are hard to undo).
We might do this with some kind of string distance function + nearest neighbor search (with a cap so that if there are no author names under a certain distance we just don't search). Not sure how this would work yet.
It should be human-in-the-loop (we present the user with a list of choices and they get to say "yes, Donald E. Knuth and Donald Ervin Knuth are the same" so that we don't make mistakes that are hard to undo).
We might do this with some kind of string distance function + nearest neighbor search (with a cap so that if there are no author names under a certain distance we just don't search). Not sure how this would work yet.