buda-base / owl-schema

BDRC Ontology Schema
11 stars 2 forks source link

changelog on the imageinstance #170

Closed eroux closed 3 years ago

eroux commented 3 years ago

Currently we have changelog for the data (/metadata, the information in the RDF in other words). I've been wondering how best to encode the synced records (which I lost BTW... where are they again? They seem to be transient in /db/tbrc/synced/synced-xxx.xml). I think a good solution would be to have a change log directly attached to the image instance (not to the data of the image instance). For instance something like:

bdr:W12827 a bdo:ImageInstance ;
                    ...
                    adm:logEntry bda:LG123 .

bda:LG123 a adm:LogEntry ;
                  adm:logType bda:LogSynced
                  adm:logDate   "2020-03-31T21:27:09.458Z"^^xsd:dateTime ;
                  adm:logMessage  "adding volume 1"@en ;
                  adm:logWho    bdr:U00006 .

wdyt @xristy ?

xristy commented 3 years ago

The records /db/tbrc/synced/synced-xxx.xml are culled by gwt/home.xqm.

The records are only used to populate the Library / Recent Acquisitions / Newly Released Digital Works on tbrc.org. The culling just removes older records so that the only really recent works are reported.

Each imagegroup that is synced has at least one log record for its sync'ing. It should be noted that, sometimes there are more than one on the same day at different times as the sync'ing script is rerun for whatever reason.

In any event those records are migrated like from bdg:W2PD17450:

bda:I4PD3011  a                  adm:AdminData ;
        adm:adminAbout           bdr:I4PD3011 ;
        adm:legacyImageGroupRID  "I4PD3011" ;
        adm:logEntry             bda:LG6EB5FFBBDF45B4B7 , bda:LGB16D2AF00C1E3D92 ;
        adm:metadataLegal        bda:LD_BDRC_CC0 ;
        adm:status               bda:StatusReleased ;
.
bda:LG6EB5FFBBDF45B4B7
        a               adm:LogEntry ;
        adm:logDate     "2018-01-26T22:14:46.049Z"^^xsd:dateTime ;
        adm:logMessage  "Updated total pages"@en ;
        adm:logWho      bdr:U00006 ;
.
bda:LGB16D2AF00C1E3D92
        a               adm:LogEntry ;
        adm:logDate     "2016-09-14T16:53:24.807Z"^^xsd:dateTime ;
        adm:logMessage  "added image group for scan request"@en ;
        adm:logWho      bdr:U00021 ;
.

So it seems to me sufficient to add:

adm:LogSynced
    a owl:Class ;
    rdfs:subClassOf adm:LogEntry ;
.

and during migration check the log message in Imagegroup for "Updated total pages" which is what is done in, for example, in /db/modules/public/work2.xqm and then write out

bda:LG6EB5FFBBDF45B4B7
        a               adm: LogSynced ;
        adm:logDate     "2018-01-26T22:14:46.049Z"^^xsd:dateTime ;
        adm:logMessage  "Updated total pages"@en ;
        adm:logWho      bdr:U00006 ;
.

I don't know that the /db/tbrc/synced/synced-xxx.xml need to figure in to things.

If the synced-xxx.xml are needed then AO can provide them since they are currently generated as part of the sync process.

eroux commented 3 years ago

ah thanks for the clarification! Here are a few options I can think of, which one do you prefer? Note that in each case that will mean a very long resync of all the image instances... but it's just this type so I think it will be ok.

Option 1

adm:ContentLogEntry rdfs:subClassOf adm:LogEntry .

bda:LG6EB5FFBBDF45B4B7
        a               adm:ContentLogEntry , adm:LogEntry ;
        .

Option 1b

If we follow the logic of option 1, ideally we would have

adm:LogEntry a owl:Class .
adm:DataLogEntry rdfs:subClassOf adm:LogEntry .
adm:ContentLogEntry rdfs:subClassOf adm:LogEntry .

In this case, though, we would have to resync all types because all the log entries will become datalogentries...

Option 2

We could also just attach the same log entry to the image group:

bda:I4PD3011  a                  adm:AdminData ;
        adm:adminAbout           bdr:I4PD3011 ;
        adm:logEntry             bda:LG6EB5FFBBDF45B4B7 .

bdr:I4PD3011 a bdo:ImageGroup ;
                       bdo:contentLogEntry bda:LG6EB5FFBBDF45B4B7 .

bda:LG6EB5FFBBDF45B4B7 a adm:LogEntry ;
                       etc.

Option 2b

Because most of the volumes will have the same log entries, and because in a way it's an entry for the image instance too (we want to know when it has been updated), we could imagine having one log entry for all of these:

bda:I4PD3011  a                  adm:AdminData ;
        adm:adminAbout           bdr:I4PD3011 ;
        adm:logEntry             bda:LG6EB5FFBBDF45B4B7 .

bdr:I4PD3011 a bdo:ImageGroup ;
                       bdo:contentLogEntry bda:LG6EB5FFBBDF45B4B7 .

bdr:I4PD3012 a bdo:ImageGroup ;
                       bdo:contentLogEntry bda:LG6EB5FFBBDF45B4B7 .

bdr:W2PD17450 a bdo:ImageInstance ;
                       bdo:contentLogEntry bda:LG6EB5FFBBDF45B4B7 .

bda:LG6EB5FFBBDF45B4B7 a adm:LogEntry ;
                       etc.
xristy commented 3 years ago

Together with the brief #168 note and the above there's some need for elaboration.

One use case is searching for recently synced ImageInstances to populate the equivalent of tbrc.org:

Library / Recent Acquisitions / Newly Released Digital Works

for which it would be handy to look for ?s rdf:type adm:LogSynced and compare adm:logDate as needed.

As far as I can see the Option 1 is the same as your original idea and that I elaborated a bit. The only difference being changing the class name from adm:LogSynced to adm: ContentLogEntry which is somewhat ambiguous since there may well be other types of log entries for reorderings, replacements, deletions and so on of images in a bdo:ImageGroup of a bdo:ImageInstance.

Option 1b needs to considered in the light of #168 and so on.

Option 2 and Option 2b seem rather implausible to me without more explicit motivation.

Elaborating on Option 1b a reasonable class hierarchy would be:

adm:LogEntry
    adm:GraphLogEntry
        adm:CreateGraph
        adm:MinorUpdateGraph
        adm:UpdateGraph
        adm:WithdrawGraph
    adm:ContentLogEntry
        adm:ScanRequested
        adm:Synced
        adm:Reordered
        adm:ImagesUpdated - duplicate deletion, images converted, etc

To implement such involves a complete dataset rebuild and some heuristics to detect which class to use in various cases.

eroux commented 3 years ago

I think it's a reasonable idea yes. It's a detail but sometimes graphs are re-instanciated after being withdrawn so I guess there should be something to that effect in the list... I think in many cases (for very old records) we can't really know if a log entry is for creation or update... so I think we can just handle the following in xmltold for now (and handle the rest in the editor):

Can you create these classes? I'll update the code in xmltold in a separate branch that we'll merge next week with a fresh migration.

xristy commented 3 years ago

I added the class hierarchy including adm:ReinstateGraph. I agree that these can be incorporated into editors, the bvmt and so on as things move forward.

I would think that old records can be considered created in the earliest log entry, etc.

I'm not sure adm:MinorUpdateGraph is useful but it might make sense in an editor.

eroux commented 3 years ago

thanks! I think minor edits have some interest for instance if we display the latest changes in our database, we can skip the minor edits to have more relevant and significant changes

eroux commented 3 years ago

Note that we could also consider logEntries as various types of events instead, as it's quite the same conceptually (someone did something at a certain time)... see also https://github.com/buda-base/owl-schema/issues/78

eroux commented 3 years ago

I think two classes are missing, for:

xristy commented 3 years ago

I'll add adm:ContentQC. It seems to me that adm:ScanRequested is sufficient versus adding adm:Scanning since there's really no granular data to record requested, initiated, completed - these distinctions were envisioned along w/ a QC process for both content and metadata, but this was never implemented since it became apparent that it was more work than the small-scale tbrc.org could really support.

eroux commented 3 years ago

Thanks for contentqc yes! It's a small detail but at least we'll have everything... I agree that we don't really need Scanning at the moment (we may need it for some data from partners but we'll deal with that in due time)... I'm not sure we really need scanrequested neither actually, I'm more tempted to migrate

<entry when="2020-03-31T23:41:26.265Z" who="Chungdak Nangpa">added image group for scan request</entry>

into the graph creation log entry...

xristy commented 3 years ago

treating it as graph creation would be correct.