Closed eroux closed 4 years ago
The records /db/tbrc/synced/synced-xxx.xml
are culled by gwt/home.xqm
.
The records are only used to populate the Library / Recent Acquisitions / Newly Released Digital Works on tbrc.org. The culling just removes older records so that the only really recent works are reported.
Each imagegroup that is synced has at least one log record for its sync'ing. It should be noted that, sometimes there are more than one on the same day at different times as the sync'ing script is rerun for whatever reason.
In any event those records are migrated like from bdg:W2PD17450:
bda:I4PD3011 a adm:AdminData ;
adm:adminAbout bdr:I4PD3011 ;
adm:legacyImageGroupRID "I4PD3011" ;
adm:logEntry bda:LG6EB5FFBBDF45B4B7 , bda:LGB16D2AF00C1E3D92 ;
adm:metadataLegal bda:LD_BDRC_CC0 ;
adm:status bda:StatusReleased ;
.
bda:LG6EB5FFBBDF45B4B7
a adm:LogEntry ;
adm:logDate "2018-01-26T22:14:46.049Z"^^xsd:dateTime ;
adm:logMessage "Updated total pages"@en ;
adm:logWho bdr:U00006 ;
.
bda:LGB16D2AF00C1E3D92
a adm:LogEntry ;
adm:logDate "2016-09-14T16:53:24.807Z"^^xsd:dateTime ;
adm:logMessage "added image group for scan request"@en ;
adm:logWho bdr:U00021 ;
.
So it seems to me sufficient to add:
adm:LogSynced
a owl:Class ;
rdfs:subClassOf adm:LogEntry ;
.
and during migration check the log message in Imagegroup for "Updated total pages"
which is what is done in, for example, in /db/modules/public/work2.xqm
and then write out
bda:LG6EB5FFBBDF45B4B7
a adm: LogSynced ;
adm:logDate "2018-01-26T22:14:46.049Z"^^xsd:dateTime ;
adm:logMessage "Updated total pages"@en ;
adm:logWho bdr:U00006 ;
.
I don't know that the /db/tbrc/synced/synced-xxx.xml
need to figure in to things.
If the synced-xxx.xml
are needed then AO can provide them since they are currently generated as part of the sync process.
ah thanks for the clarification! Here are a few options I can think of, which one do you prefer? Note that in each case that will mean a very long resync of all the image instances... but it's just this type so I think it will be ok.
adm:ContentLogEntry rdfs:subClassOf adm:LogEntry .
bda:LG6EB5FFBBDF45B4B7
a adm:ContentLogEntry , adm:LogEntry ;
.
If we follow the logic of option 1, ideally we would have
adm:LogEntry a owl:Class .
adm:DataLogEntry rdfs:subClassOf adm:LogEntry .
adm:ContentLogEntry rdfs:subClassOf adm:LogEntry .
In this case, though, we would have to resync all types because all the log entries will become datalogentries...
We could also just attach the same log entry to the image group:
bda:I4PD3011 a adm:AdminData ;
adm:adminAbout bdr:I4PD3011 ;
adm:logEntry bda:LG6EB5FFBBDF45B4B7 .
bdr:I4PD3011 a bdo:ImageGroup ;
bdo:contentLogEntry bda:LG6EB5FFBBDF45B4B7 .
bda:LG6EB5FFBBDF45B4B7 a adm:LogEntry ;
etc.
Because most of the volumes will have the same log entries, and because in a way it's an entry for the image instance too (we want to know when it has been updated), we could imagine having one log entry for all of these:
bda:I4PD3011 a adm:AdminData ;
adm:adminAbout bdr:I4PD3011 ;
adm:logEntry bda:LG6EB5FFBBDF45B4B7 .
bdr:I4PD3011 a bdo:ImageGroup ;
bdo:contentLogEntry bda:LG6EB5FFBBDF45B4B7 .
bdr:I4PD3012 a bdo:ImageGroup ;
bdo:contentLogEntry bda:LG6EB5FFBBDF45B4B7 .
bdr:W2PD17450 a bdo:ImageInstance ;
bdo:contentLogEntry bda:LG6EB5FFBBDF45B4B7 .
bda:LG6EB5FFBBDF45B4B7 a adm:LogEntry ;
etc.
Together with the brief #168 note and the above there's some need for elaboration.
One use case is searching for recently synced ImageInstance
s to populate the equivalent of tbrc.org:
Library / Recent Acquisitions / Newly Released Digital Works
for which it would be handy to look for ?s rdf:type adm:LogSynced
and compare adm:logDate
as needed.
As far as I can see the Option 1 is the same as your original idea and that I elaborated a bit. The only difference being changing the class name from adm:LogSynced
to adm: ContentLogEntry
which is somewhat ambiguous since there may well be other types of log entries for reorderings, replacements, deletions and so on of images in a bdo:ImageGroup
of a bdo:ImageInstance
.
Option 1b needs to considered in the light of #168 and so on.
Option 2 and Option 2b seem rather implausible to me without more explicit motivation.
Elaborating on Option 1b a reasonable class hierarchy would be:
adm:LogEntry
adm:GraphLogEntry
adm:CreateGraph
adm:MinorUpdateGraph
adm:UpdateGraph
adm:WithdrawGraph
adm:ContentLogEntry
adm:ScanRequested
adm:Synced
adm:Reordered
adm:ImagesUpdated - duplicate deletion, images converted, etc
To implement such involves a complete dataset rebuild and some heuristics to detect which class to use in various cases.
I think it's a reasonable idea yes. It's a detail but sometimes graphs are re-instanciated after being withdrawn so I guess there should be something to that effect in the list... I think in many cases (for very old records) we can't really know if a log entry is for creation or update... so I think we can just handle the following in xmltold for now (and handle the rest in the editor):
adm:WithdrawnGraph
adm:ScanRequested
adm:Synced
and adm:ImagesUpdated
Can you create these classes? I'll update the code in xmltold in a separate branch that we'll merge next week with a fresh migration.
I added the class hierarchy including adm:ReinstateGraph
. I agree that these can be incorporated into editors, the bvmt and so on as things move forward.
I would think that old records can be considered created in the earliest log entry, etc.
I'm not sure adm:MinorUpdateGraph
is useful but it might make sense in an editor.
thanks! I think minor edits have some interest for instance if we display the latest changes in our database, we can skip the minor edits to have more relevant and significant changes
Note that we could also consider logEntries as various types of events instead, as it's quite the same conceptually (someone did something at a certain time)... see also https://github.com/buda-base/owl-schema/issues/78
I think two classes are missing, for:
I'll add adm:ContentQC
. It seems to me that adm:ScanRequested
is sufficient versus adding adm:Scanning
since there's really no granular data to record requested, initiated, completed - these distinctions were envisioned along w/ a QC process for both content and metadata, but this was never implemented since it became apparent that it was more work than the small-scale tbrc.org could really support.
Thanks for contentqc yes! It's a small detail but at least we'll have everything... I agree that we don't really need Scanning at the moment (we may need it for some data from partners but we'll deal with that in due time)... I'm not sure we really need scanrequested neither actually, I'm more tempted to migrate
<entry when="2020-03-31T23:41:26.265Z" who="Chungdak Nangpa">added image group for scan request</entry>
into the graph creation log entry...
treating it as graph creation would be correct.
Currently we have changelog for the data (/metadata, the information in the RDF in other words). I've been wondering how best to encode the synced records (which I lost BTW... where are they again? They seem to be transient in
/db/tbrc/synced/synced-xxx.xml
). I think a good solution would be to have a change log directly attached to the image instance (not to the data of the image instance). For instance something like:wdyt @xristy ?