API to download zip file

eroux commented 2 years ago

follow-up on https://github.com/buda-base/archive-ops/issues/572 , still to be discussed.

the API would be something along the lines of:

/imagezip/bdr:WXXX

it would get the list of image groups and generate on the fly a zip file with a directory structure

WorkRID
    |
   +---- archive
                  |
                 + WorkRID-ImageGroup1RID
                 + WorkRID-ImageGroup2RID
                 + ......
   +---- images
                  |
                 + WorkRID-ImageGroup1RID
                 + WorkRID-ImageGroup2RID
                 + ......

(perhaps with sources too?)

jimk-bdrc commented 2 years ago

Could we clarify the purpose of this activity before adding things we think we might like to have? For example, in some cases AO will make a sources directory on its own. Who does 'sources' benefit, and what is it for? In short, make a business case before tacking it on!

I'm not saying no, I'm just saying we should be explicit about why we're doing something, other than a couple of people think it would be nice to have.

eroux commented 2 years ago

Well, it's really a matter of understanding the current workflow and replicate it, possibly with some improvements. If I understand correctly, the current tools provide the librarians with a zip file containing a images folder. If that's satisfactory for everyone, no reason to change it (especially since there's a cost in change).

Now, since we are changing the tool chain anyways, now is a good time to change the zip file folder structure, so another way of looking at this is: what input would AO want from the librarians? there might be different cases, perhaps:

a case where we get web images from another country, and then we might get archive images later on
a case where get a PDF per volume (that should go in sources)
etc.

We could imagine that the librarians get a zip file with the 3 directories (images, archive, sources), and then delete (or leave empty) the ones that are not relevant for the current case...

But perhaps this will just add some confusion and isn't necessary

TBRC-Travis commented 2 years ago

We could imagine that the librarians get a zip file with the 3 directories (images, archive, sources), and then delete (or leave empty) the ones that are not relevant for the current case.

Perhaps you're right @eroux. That could provide the most flexibility. To clarify the purpose of each directory:

sources - raw, unprocessed source images from the field
archive - an intermediary high quality image set most often used when source images need alot of processing. contains high-quality, uncompressed, and cropped TIF images. the idea here is that if web images need to be re-derived that we don't have to start over again from sources but that we have can simply rederive from "archive". this folder is not always used, for example, when sources are in good shape to begin with an intermediary folder like this is not needed.
images - web images suitable for website / app access

I was only suggesting the tool only provide sources in the zip directory since most things that we get from the field fall into the "sources" category. the exceptions being BDRC-managed projects like USAID which is "images" only. Or FPL, NLM, FEMC which follow their own independent process outside of DLMS.

eroux commented 2 years ago

thanks! I'm trying to go through different scenarios in my head, but it seems that currently the zip files only contain an images folder, which gets filled with whatever we get (that sometimes can stay in images, other times gets moved to sources or archives. Does that sound right? If so, if we decide to stick with that workflow and have just one directory, the name doesn't matter much since AO will rename it if necessary...

TBRC-Travis commented 2 years ago

yes, exactly right. I rename it as needed during processing.

buda-base / editserv

API to download zip file #21