azgs / azlibrary_database

1 stars 1 forks source link

More complex versioning options #40

Closed aazaff closed 4 months ago

aazaff commented 4 years ago

I describe in https://github.com/azgs/azlibrary_api/issues/35 how the current system should handle removals.

That strategy is based on a relatively simplistic 1-to-1 lineage, but we could envision much more complicated branching options.

This is only something to consider for a major upgrade (v2.0).

aazaff commented 4 years ago

Note that this would have to be navigated around azgs/azlibrary_api#57

aazaff commented 4 years ago

A good example of this is that http://data.azgs.arizona.edu/api/v1/metadata/ADGM-1552430032775-571 is being split into and superseded by 4 new collections.

aazaff commented 1 year ago

We received dedicated funding to do this. We now must have this implemented by end of May 2024.

NoisyFlowers commented 6 months ago

To be clear, we want our lineage methodology to allow:

To do this, we will need to move the supersedes/superseded_by columns of public.collections into a table of their own. Let's call it lineage. Lineage might look like this:

collection_id    superseded_by    supersedes 

This is the most obvious column structure in transitioning from the current way we do things. But it would mean there would be two records for a given relationship

We might be able to simplify lineage to look like this:

collection_id    supersedes

since if Y supersedes X, it is implied that X is superseded by Y.

This would avoid duplicate information for each relationship. But it might complicate deep queries like "find the latest". Not sure yet.

NoisyFlowers commented 6 months ago

Here is a surficial survey of the code impacts for this change. This is simply a list of locations in the current code that must change. It does not delve into implementation logic.

azlibrary_database
    azlibAdd
        index.js
            267: Inserting record into public.collections must be transaction that includes insert row(s) 
                                into public.lineage.
            390: Deprecation of old collection must update metadata of all predecessor collections. 
                                Value will be an array of one or more collections.
    azlibConfigPG
        public.sql
            93: remove supersedes and superseded_by from public.collections
        metadata.sql
            42: collections_trigger updates several things in public.collections from the azgs metadata. 
                             Two of these are supersedes and superseded_by. This must be in a transaction that updates 
                             public.lineage as well.
            Patch for same.

azlibrary_api
    collections.model.js
        45: The query in the routine checkStatus must be modified to join with public.lineage.
        delete routine
            218: querySQL must join with public.lineage.
            234: previousCollectionQuerySQL must join with public.lineage.
            244: deleteSQL must join with public.lineage.
            252: unlinkSQL must join with public.lineage.
            273: probably want an aggregate value here.
            294: Here is where we are using all the above queries. Use aggregate where appropriate and iterate 
                                where not.
    collections.routes.js   
        handleFullImport
            553: metadata.indentifiers.supersedes will be array. So will req.params.collectionID.
            564  Check existence and length of metadata.identifiers.supersedes
        verifyMethod
            974: check existence and length of supersededBy
            980: iterate supersededBy
            991: iterate supersededBy
            1002: check existence and length of supersededBy
            1008: iterate supersededBy
            1019: iterate supersededBy
    colleciton_groups.model.js
        25: join aggregate public.lineage
        95: join aggregate public.lineage
    csv.js
        Figure out how best to massage multiple values for preceded_by and superseded_by.
    metadata.model.js   
        getSingle
            11, 12: join with public.lineage and aggregate. Or just rely on metadata.
        buildLinaege
            never used. delete.
        get
            92: will need to aggregate or something
            104-125: whooboy, I dunno. Recursive query that will need to join to public.lineage. This one needs thought.
            299: join public.lineage and aggregate
    metadata.routes.js
        get 
            53-68: We're working with results from model here, so will depend on how we do that. Will likely 
                                   need to iterate to create version links
            271-288: Same comment
NoisyFlowers commented 6 months ago

We will also need a data migration script to convert the existing db to use whatever we decide is the proper lineage table.

NoisyFlowers commented 5 months ago

azlibrary_webclient is giving me pause.

Thinking about how the input form currently works and how it interacts with the api.

Currently, if you want to replace a collection, you select the Replace radio button at the top then enter the id of the collection you want to replace. This prepopulates the form with values from that collection. Clicking the Replace button at the bottom causes the api to create a new collection that supersedes the one specified.

If somebody later tried to replace the same collection, the api catches it and throws an error that the collection is already deprecated.

We'll obviously be getting rid of that check.

Thus, the same form will work for replacing one collection with many, if that's what a user intends. They would just replace the same collection over and over.

But what if that's not what they intend? What if they come along later and want to replace the collection and haven't taken the time to verify that it is the end of a lineage? The form and the api would let them do it, and they would have created another sibling to the ones entered earlier. Not what they intended, and they would never know. And we have no way to know either. Feels spooky to me.

Next, what if a user wants to replace several collections with one? The api call can be modified to accept multiple collection_ids. But our current form will not handle this. We'll have to modify the form to allow select of existing collection_ids. But which, if any, would we use to prepopulate the form? Or do we coalesce everything from all selected collections?

NoisyFlowers commented 4 months ago

Changes for this are in bff98c0947c9f1a864a07318b011ebe6cb5cfd3c 5a952c719eb7c29dea291f32f4186a3053858d6d 230796deadcb830e38ff323e1f034d469a74968c c4dca05364dc2133aec802cd282e5d2df93ba3fe 19237353069e2b994d7efe447760971d7f2e2c2f 3716f618d45681c3da0da12849f2b8ac5278b969 5315cfa3e1ab9827dca3ac4cb7be498345eddfca cda64054ae14cef78a0ae7435617000a552979b2 c8b1e1fa8417068eccc1280249acf895b73a10d7 ca087543ba265f09e634704739e7fa9eaaedd96e 23c7a29b632cd686ada4acd225035e1a86a8f5ae 287c77bc6f590136971ff72655fb7fdcc4330553 c3456e0025fec97dd3e16dd11aa4c3f6035b5236 a45d5dc1373099b63dc0d63059e7cd6e5cc5b6fc 1be1d15273418d1c058877af6dec5f3c011f13e9 0a4e410534c3757ae148218af57a4863985fd7e9 5587203e7135ce8508dff8c4ad0c14af68172369

azgs/azlibrary_api@211599b9807734ce06c4dfe9c2a543dd77639e9d azgs/azlibrary_api@1f6c5df5ed1c3595bc0e77110a6ebb5db57d73e1 azgs/azlibrary_api@384741dfd765c0d0042f834d28940f7b982894d3 azgs/azlibrary_api@e85cd5623d62a7ec60694f9676506a396926bc25 azgs/azlibrary_api@0d17117858ab7f5dbb15790adfa00d425f82266b azgs/azlibrary_api@302ce39755110e845f0cdcc1678bac4bc1e1d796 azgs/azlibrary_api@6f44625a069206761dbb6d0206193b5b384f03c8 azgs/azlibrary_api@27d6e9abeb6ca90c05a91ba11d875d423066da65 azgs/azlibrary_api@f46f1e4bbcdae13a4af771904aa2e36af83496cf azgs/azlibrary_api@b085303d02ee47fc8eb0a142ea91af266ff97887 azgs/azlibrary_api@d97bf19331b719b57aa80bee3ecad716b89928f5 azgs/azlibrary_api@bdfbf37b345d27ad415702e51026364b54097f34 azgs/azlibrary_api@8136d05046503b39973ee001bda020d9a934e687 azgs/azlibrary_api@a339b56120817b6394f1e356e194ef00a59034fa

azgs/azlibrary_webclient@12c308493fbacd0bb09cad2df93dfa9f85bb968c azgs/azlibrary_webclient@318a1c2e42496c861e84c802d8942ea5a95e4617 azgs/azlibrary_webclient@9a0b3df4599bff8e183a77c33d9c2ee237acfd9f

azgs/azlibrary_react@3e3b2a1601cf6f5a9893f1373caa89d966331764

Not yet merged to master (except in webclient), running on dev

NoisyFlowers commented 4 months ago

Also note: old lineage links are copied to public.lineage_removed before they are removed from public.lineage