Open gjost opened 7 months ago
The command did appear to generate expected results when it was run back in Feb and Mar of last year. See results csvs in /media/qnfs/kinkura/working/names
re: ddrnames load
, this command is a bit unusual. For whatever reason, I wrote it with a --save
arg and a --commit
arg. Without those args it just prints stuff out to STDOUT.
Update: ddrnames load
issue moved to https://github.com/denshoproject/ddr-cmdln/issues/241
@GeoffFroh Can you post ddr-manz-4-persons.csv
from the initial call? I can't duplicate this without that source CSV
I'm not sure we can do anything about this one. It looks like Kiyomatsu Tani
is just what we're getting back from the SQLite fulltext search.
I hacked namesdb searchmulti
to print out debugging info including the SQL statements:
(names) ddr@densho101dev:/opt/namesdb-editor$ namesdb searchmulti /tmp/ddr-manz-4-persons.csv --sql
"objectid","namepart","n","preferred_name","nr_id","score","matching","sample"
item={'namepart': 'Nagatomi, Shinjo', 'oid': 'ddr-manz-4-1', 'fieldname': 'persons'}
fulltext search
fulltext_search_sql
sql='SELECT rowid, rank FROM names_person_fts("nagatomi shinjo")'
sql2='SELECT rowid, * FROM names_person_fts WHERE rowid IN (4750)'
"ddr-manz-4-1","Nagatomi, Shinjo","0","Kiyomatsu Tani","88922/nr003wb7p","-28.54437565963006","","namepart: Nagatomi, Shinjo | nr_id: 88922/nr003wb7p"
For whatever reason, SQLite's FTS algorithms seem to think that's the best match for a search on nagatomi shinjo
:
(names) ddr@densho101dev:/opt/namesdb-editor$ python src/manage.py dbshell --database names
SQLite version 3.34.1 2021-01-20 14:10:07
Enter ".help" for usage hints.
sqlite> SELECT rowid, rank FROM names_person_fts("nagatomi shinjo");
4750|-28.5443756596301
sqlite> SELECT rowid, * FROM names_person_fts WHERE rowid IN (4750);
4750|88922/nr003wb7p|Tani|Kiyomatsu|||||||Kiyomatsu Tani
sqlite> SELECT rowid, rank FROM names_person_fts("nagatomi");
4750|-14.093919355483
4751|-14.093919355483
4752|-14.093919355483
4753|-14.093919355483
sqlite> SELECT rowid, rank FROM names_person_fts("shinjo nagatomi");
4750|-28.5443756596301
sqlite> SELECT rowid, * FROM names_person_fts WHERE rowid IN (4750,4751,4752,4753);
4750|88922/nr003wb7p|Tani|Kiyomatsu|||||||Kiyomatsu Tani
4751|88922/nr003wb8c|Tani|Misao|||||||Misao Tani
4752|88922/nr003wb92|Tani|Yasujiro|||Joe||||Yasujiro Joe Tani
4753|88922/nr003wc0h|Tani|Aya|||||||Aya Tani
Unfortunately I think this may just be what we get. We need humans in this loop.
FWIW, I can search these strings directly from the NR Editor admin and get the expected names.
The search in the Django Admin is probably not using SQLite FTS.
I could maybe add --fulltext
/--boolean
arg pair so you had the option to do a boolean search if the fulltext algo doesn't do what you want?
Perhaps something bad happened to the SQLite fulltext index? Maybe force a reindex?
That's exactly what it was. The answer was right in namesdb
searchmulti -h. Running the following as
ddr` worked in my local:
sqlite-utils disable-fts db/namesregistry.db names_person
and then
sqlite-utils enable-fts --fts5 db/namesregistry.db names_person nr_id \
family_name given_name given_name_alt other_names middle_name \
prefix_name suffix_name jp_name preferred_name
Ran the commands against the canon db on kyuzo
and it seems to have worked.
@GeoffFroh: I just got some odd results out of the namesdb searchmulti command. Here’s the call:
This collection —
ddr-manz-4
-- is the one with all the photos of Rev. Shinjo Nagatomi. In the entity metadata, his name appears as:"Nagatomi, Shinjo"
. His person record in the NR database (id:88922/nr009tb36
) is the same so it would seem like the search should return the record; but the output fromnamesdb searchmulti
is this:Here’s the full output (ddr-manz-4-persons-results-sql.csv):
It looks like it’s not just that particular name;
Iwata, Jack
is returningAllan Tomio Mizuhara
, andHori, Tashi
returnsSonoko Kondo
FWIW, I can search these strings directly from the NR Editor admin and get the expected names. Note: I have to omit the,
char, or I get no results — i.e.,Hori Tashi
returns the record;Hori, Tashi
returns zero results. I’m assuming this is something specific to the default django admin search config. This works: http://namesdbeditor.local/admin/names/person/?q=Hori+Tashi This does not work: http://namesdbeditor.local/admin/names/person/?q=Hori%2C+Tashi (edited)Update:
ddrnames load
moved to https://github.com/denshoproject/ddr-cmdln/issues/241