Closed gjost closed 7 years ago
The batch import ran like this: Data was gathered into three files for each collection: interviews.csv (first-level Entities), segments.csv (second-level Entities), and files.csv (files). Interviews were imported first, then segments, then files. At the time interviews were imported there were no segments or files, etc.
Where Entities have child files the .file_groups
attribute is populated as expected, but .children
is not.
When File objects are created/modified the parent Entity is poked and it updates its list of children/files.
We need to update the Entity object so that it causes its parent Entity (if any) to update its children/files.
This does not update the parent Entity:
from DDR import config, identifier
f = identifier.Identifier('ddr-densho-1016-1', config.MEDIA_BASE).object()
f.write_json()
Nor does this:
from DDR import config, identifier
f = identifier.Identifier('ddr-densho-1016-1', config.MEDIA_BASE).object()
f.save(USERNAME, USERMAIL)
Running ddr-transform
on the collection does cause updates.
Instead of distributing save code we need each object to have a single .save() method. For Entity this method must call Entity.load_children_objects
and Entity.load_file_objects
.
Executive summary: Good news, I think. I reworked the $MODEL.save() functions and an initial batch test indicates that it worked.
The object writing code has been kindof a mess since the beginning. My initial project code in DDR.commands.py basically gathered the manual git/git-annex commands into functions and the initial Django app just called those functions directly. I've done several rounds of refactoring over the years but still there was no single .save() method for objects. Sometime over the past year I did add a .save() method to address this but I must have gotten called away halfway through because I never plugged it in.
The loose object-save code is now gathered into Collection/Entity/File.save() methods, and all code that saves objects now uses these methods*.
One roadblock in all this is that while most of the time we want to write files and then commit, there are a couple instances (e.g. batch import) where we want to NOT commit. I reworked the methods to return lists of modified files for this instance.
So far I've tested it by creating/editing a collection with some entities and files, and batch-importing ddr-densho-1016 into an empty test collection. It seems to work but I'd like to test further.
Fixed in e3fe34f.
Entity/segment
.children
lists are not populated when batch-importing from CSV.Update: We need to update the Entity object so that it causes its parent Entity (if any) to update its children/files.