dhicks / comp-HOPOS

Building a comprehensive* dataset of 20th century philosophy of science
0 stars 0 forks source link

initialed names #1

Closed dhicks closed 6 years ago

dhicks commented 6 years ago

In production, 04_ produces false negatives with initialed names. Consider this chunk of 04_names_verif.csv:

Campbell,A. H.,Campbell,A H
Campbell,Andrew,Campbell,Andrew
Campbell,C. A.,Campbell,Charles A
Campbell,Charles A.,Campbell,Charles A
Campbell,D'Ann,Campbell,Dann
Campbell,D.,Campbell,A H
Campbell,Debra,Campbell,Debra
Campbell,Donald,Campbell,Donald T
Campbell,Donald T.,Campbell,Donald T
Campbell,Douglas,Campbell,Douglas I
Campbell,Douglas I.,Campbell,Douglas I

However, on a test set with these original names, 04_ does not produce these false negatives.

dhicks commented 6 years ago

Test input and corresponding output test.zip

dhicks commented 6 years ago

Resolved in 31603f26ab1ed2fbc59c4e52db2d2ea0a22059d4