denshoproject / ddr-cmdln

Command-line tools for automating the Densho Digital Repository's various processes.
Other
0 stars 2 forks source link

Weird behavior from `ddrnames load` #241

Open gjost opened 10 months ago

gjost commented 10 months ago

@GeoffFroh Ran into another issue. I manually fixed the results file from above with good data, then used the ddrnames load command. It appeared to function correctly, but no changes were made to the targeted repo. Here’s the terminal session:

(cmdln) ddr@kyuzo:/media/qnfs/kinkura/working/ireilaunch$ ddrnames load persons /media/qnfs/kinkura/working/ireilaunch/ddr-manz-4-persons-results-good.csv /media/qnfs/kinkura/working/ireilaunch/ddr-manz-4
Collection /media/qnfs/kinkura/working/ireilaunch/ddr-manz-4
Loading data from /media/qnfs/kinkura/working/ireilaunch/ddr-manz-4-persons-results-good.csv
142 rows
Grouping data...
Updating objects...
(cmdln) ddr@kyuzo:/media/qnfs/kinkura/working/ireilaunch$ cd ddr-manz-4/
(cmdln) ddr@kyuzo:/media/qnfs/kinkura/working/ireilaunch/ddr-manz-4$ git status
On branch master
Your branch is up to date with 'origin/master'.
nothing to commit, working tree clean

Here’s the last entry in the git log:

commit a68cebceca0bab24157481173bd2efa021027daa (HEAD -> master, origin/master, origin/HEAD)
Author: Caitlin Oiye <caitlin.oiye@densho.org>
Date:   Wed Dec 13 11:04:20 2017 -0700

    Updated metadata file(s)

    @agent: ddr-local

And here’s the input csv (ddr-manz-4-persons-results-good.csv:

objectid,namepart,n,preferred_name,nr_id,score,matching,sample
ddr-manz-4-1,Nagatomi, Shinjo,0,Shinjo Nagatomi,88922/nr009tb36,-28.54437565963006,,namepart: Nagatomi, Shinjo | nr_id: 88922/nr009tb36
ddr-manz-4-3,Nagatomi, Shinjo,0,Shinjo Nagatomi,88922/nr009tb36,-28.54437565963006,,namepart: Nagatomi, Shinjo | nr_id: 88922/nr009tb36
ddr-manz-4-4,Nagatomi, Shinjo,0,Shinjo Nagatomi,88922/nr009tb36,-28.54437565963006,,namepart: Nagatomi, Shinjo | nr_id: 88922/nr009tb36
ddr-manz-4-5,Nagatomi, Shinjo,0,Shinjo Nagatomi,88922/nr009tb36,-28.54437565963006,,namepart: Nagatomi, Shinjo | nr_id: 88922/nr009tb36
ddr-manz-4-8,Nagatomi, Shinjo,0,Shinjo Nagatomi,88922/nr009tb36,-28.54437565963006,,namepart: Nagatomi, Shinjo | nr_id: 88922/nr009tb36
ddr-manz-4-9,Nagatomi, Shinjo,0,Shinjo Nagatomi,88922/nr009tb36,-28.54437565963006,,namepart: Nagatomi, Shinjo | nr_id: 88922/nr009tb36
ddr-manz-4-10,Nagatomi, Shinjo,0,Shinjo Nagatomi,88922/nr009tb36,-28.54437565963006,,namepart: Nagatomi, Shinjo | nr_id: 88922/nr009tb36
ddr-manz-4-11,Nagatomi, Shinjo,0,Shinjo Nagatomi,88922/nr009tb36,-28.54437565963006,,namepart: Nagatomi, Shinjo | nr_id: 88922/nr009tb36
ddr-manz-4-12,Mikami, Yoshiko,0,Yoshiko Mikami,88922/nr009qq80,-17.09702224802421,,namepart: Mikami, Yoshiko | nr_id: 88922/nr009qq80
...

@GeoffFroh @sarabeckman: have you been able to successfully update person data with the new NR IDs using the ddrnames load command? @sarabeckman I have not tried using the ddrnames load the archivists and I have just been adding the data to the entity csv and then I use ddrimport entity

gjost commented 10 months ago

ddrnames load is a bit unusual. For whatever reason, I wrote it with a --save arg and a --commit arg. Without those args it just prints stuff out to STDOUT.

GeoffFroh commented 10 months ago

Update: I tried ddrnames load using both the --save and --commit flags. The changes were still not made in the targeted repo.

(cmdln) ddr@kyuzo:/media/qnfs/kinkura/working/ireilaunch$ ddrnames load persons /media/qnfs/kinkura/working/ireilaunch/ddr-manz-4-persons-results-good.csv --save --user DDRAdmin --mail kyuzo@hq.densho.org /media/qnfs/kinkura/working/ireilaunch/ddr-manz-4
Collection /media/qnfs/kinkura/working/ireilaunch/ddr-manz-4
Loading data from /media/qnfs/kinkura/working/ireilaunch/ddr-manz-4-persons-results-good.csv
142 rows
Grouping data...
Updating objects...
(cmdln) ddr@kyuzo:/media/qnfs/kinkura/working/ireilaunch$ cd ddr-manz-4
(cmdln) ddr@kyuzo:/media/qnfs/kinkura/working/ireilaunch/ddr-manz-4$ git status
On branch master
Your branch is up to date with 'origin/master'.

nothing to commit, working tree clean
gjost commented 9 months ago

Looks like the problem is that I left out some documentation, and we all forgot how names searchmulti and ddrnames load are supposed to work together.

The output of names searchmulti contains the column matching, which is left blank. When you review the output CSV, you need to mark which of the rows are matches before ddrnames load will see them.

I've updated ddrnames help with better documentation of the whole ddrnames dump -> namesdb searchmulti -> ddrnames load process, and updated the dump and load commands to point to it. Help for namesdb searchmulti has also been updated.