denshoproject / ddr-cmdln

Command-line tools for automating the Densho Digital Repository's various processes.
Other
0 stars 2 forks source link

update parent entity.children, entity.file_groups on save #40

Closed gjost closed 7 years ago

gjost commented 7 years ago

Entity/segment .children lists are not populated when batch-importing from CSV.

Update: We need to update the Entity object so that it causes its parent Entity (if any) to update its children/files.

gjost commented 7 years ago

The batch import ran like this: Data was gathered into three files for each collection: interviews.csv (first-level Entities), segments.csv (second-level Entities), and files.csv (files). Interviews were imported first, then segments, then files. At the time interviews were imported there were no segments or files, etc.

Where Entities have child files the .file_groups attribute is populated as expected, but .children is not.

When File objects are created/modified the parent Entity is poked and it updates its list of children/files.

We need to update the Entity object so that it causes its parent Entity (if any) to update its children/files.

gjost commented 7 years ago

This does not update the parent Entity:

from DDR import config, identifier
f = identifier.Identifier('ddr-densho-1016-1', config.MEDIA_BASE).object()
f.write_json()

Nor does this:

from DDR import config, identifier
f = identifier.Identifier('ddr-densho-1016-1', config.MEDIA_BASE).object()
f.save(USERNAME, USERMAIL)

Running ddr-transform on the collection does cause updates.

gjost commented 7 years ago

Instead of distributing save code we need each object to have a single .save() method. For Entity this method must call Entity.load_children_objects and Entity.load_file_objects.

gjost commented 7 years ago

Executive summary: Good news, I think. ​ I reworked the $MODEL.save() functions and an initial batch test indicates that it worked.

The object writing code has been kindof a mess since the beginning. My initial project code in DDR.commands.py basically gathered the manual git/git-annex commands into functions and the initial Django app just called those functions directly. I've done several rounds of refactoring over the years but still there was no single .save() method for objects. Sometime over the past year I did add a .save() method to address this but I must have gotten called away halfway through because I never plugged it in.

The loose object-save code is now gathered into Collection/Entity/File.save() methods, and all code that saves objects now uses these methods*.

One roadblock in all this is that while most of the time we want to write files and then commit, there are a couple instances (e.g. batch import) where we want to NOT commit. I reworked the methods to return lists of modified files for this instance.

So far I've tested it by creating/editing a collection with some entities and files, and batch-importing ddr-densho-1016 into an empty test collection. It seems to work but I'd like to test further.

gjost commented 7 years ago

Fixed in e3fe34f.