DistributedProofreaders / dproofreaders

Distributed Proofreaders is a web application intended to ease the process of converting public domain books into e-texts.
https://www.pgdp.net
GNU General Public License v2.0
49 stars 28 forks source link

Ability for squirrels to change project images via script #843

Open srjfoo opened 2 years ago

srjfoo commented 2 years ago

One of the most common tasks a squirrel does is replace proofing images, or add, delete or replace illustration (non-proofing) images. Sometimes just a handful, but often in bulk. Currently, this must be done via command line.

After discussing the idea with @cpeel , it seems that the best place to delete non-proofing images would be to extend the existing functionality in tools/proofers/images_index.php to allow squirrels to delete non-proofing images either selected or in bulk in any project state (obviously other than deleted or archived). Or to replace images of the same name. (This should be restricted just to the non-proofing images.)

The other functionality that is needed is the ability to replace existing proofing images in bulk. This could mean replacing all proofing images, or just a partial replacement, but more than is easily replaced using handle_bad_page one at a time. This can probably also be done by adding admin functionality to an existing script, but I'm not sure where.

cpeel commented 2 years ago

This can probably also be done by adding admin functionality to an existing script, but I'm not sure where.

Yeah, there's not a great existing place for these to go, just a handful of not-great-but-maybe-not-terrible places:

It is probably worth investigating images_index.php for the illustrations case and seeing if there's an opportunity for the page images case as well.

srjfoo commented 2 years ago

This can probably also be done by adding admin functionality to an existing script, but I'm not sure where.

Yeah, there's not a great existing place for these to go, just a handful of not-great-but-maybe-not-terrible places:

  • add_files.php is what does the initial load and does some handling of bulk images, but this also deals a lot with text and I'm dubious that this is the right place.

Yeah, that's the problem with using add_files.php

  • remote_file_manager.php is perhaps a logical place as it deals with file management, but right now it's all outside of the project directory (by design) and also only deals with individual files.

Right, so far, remote_file_manager.php deals just with the dpscans uploads directory.

It is probably worth investigating images_index.php for the illustrations case and seeing if there's an opportunity for the page images case as well.

Sounds reasonable.

srjfoo commented 1 year ago

This can probably also be done by adding admin functionality to an existing script, but I'm not sure where.

Yeah, there's not a great existing place for these to go, just a handful of not-great-but-maybe-not-terrible places:

  • add_files.php is what does the initial load and does some handling of bulk images, but this also deals a lot with text and I'm dubious that this is the right place.

After mulling this over for a while, I'm not sure that add_files.php is necessarily a bad place for this, iff it's reasonable to check what round a project is in, and only allow squirrels access to it if it's past the point where PMs can load projects.

The down-side would be that add_files.php also can add text, though in this case, we don't want it too, so for anything after P1 Unavailable, it would have to be blocked.

It is probably worth investigating images_index.php for the illustrations case and seeing if there's an opportunity for the page images case as well.

Maybe this is the best place, after all. We don't want the option of deleting page images; just replacing them. The only place that the page images can be deleted is from the page details page, because it has to get text and image. So, if the individual replace could be made available for squirrels as a bulk replace for proofing images, and bulk deletion and replacement/addition for illustration images, I think that's what's needed.

chrismiceli commented 11 months ago

I think the proper place for this is add_files.php. This is the current mechanism to add files as well as replace images in bulk for PMs. It makes sense a site_administrator should be able to use this experience to accomplish the same. The steps would be:

  1. The site_administrator would have to ensure the image file names are matching the images they want to replace.
  2. Upload a zip of the images to be bulk replaced
  3. Go to the project
  4. Fill out the file name in the add/replace text and image textbox.
  5. Replace the images

I think this is all there already. The only necessary change would be dhowing site administrators the add replace text/image textbox on the project page when appropriate. Do we know what states we want to enable this behavior for (for site administrators only)?

srjfoo commented 11 months ago

I think this is all there already. The only necessary change would be showing site administrators the add replace text/image textbox on the project page when appropriate. Do we know what states we want to enable this behavior for (for site administrators only)?

It's all there, yes, but I think we want an additional check that the squirrel isn't accidentally trying to replace the text. Even squirrels shouldn't be replacing text in a project that is in the later rounds. As it currently is, when text/image pairs are loaded, the text goes in the master_text slot (OCR). For adding new or replacing existing images, all stages (at least, I don't see a reason to exclude PP/PPV/SR if a request to replace illustration images is made by the PPer, for instance). And, yes, for site-admins only.

cpeel commented 11 months ago

I'm cool with expanding add_files.php to be useful outside of initial load. I would love for the functionality in that page to be pulled out into functions we could use (and test!) separately -- specifically use in the API.

srjfoo commented 11 months ago

PR #968 takes care of the ability for squirrels to delete illustration images at any time in a project's pre-archiving time.

Still to be done: add the ability for squirrels to load replacement images at any time. The current thinking is to modify add_files.php to allow

[1: This is why I was wondering whether it would be easier to treat this the same way we do uploads for SR/backing up PP or PPV files; there's nothing that needs to be done in the database at all. All we need to do is allow squirrels to replace any images that are there already, or to add new illustration images. That is, move the images into the project directory. If there's already an image existing, it will be overwritten, if not, the new image will be copied into the project directory. If the squirrel copies a proofing image in where one doesn't already exist, it will correctly go in as a non-page image).]

srjfoo commented 11 months ago

This can probably also be done by adding admin functionality to an existing script, but I'm not sure where.

Yeah, there's not a great existing place for these to go, just a handful of not-great-but-maybe-not-terrible places:

  • add_files.php is what does the initial load and does some handling of bulk images, but this also deals a lot with text and I'm dubious that this is the right place.

  • remote_file_manager.php is perhaps a logical place as it deals with file management, but right now it's all outside of the project directory (by design) and also only deals with individual files.

It is probably worth investigating images_index.php for the illustrations case and seeing if there's an opportunity for the page images case as well.

Going back to Casey's initial response to the issue description, I've been trying to read through add_files.php off and on, and getting at least a sense of what it does, and I'm less and less convinced that it's the right place for the bulk replacement of images.

Consider what we have now: a project's PM, PFs and squirrels can bulk replace/add text and images in the new project state, and in P1 Unavailable. If the image has a corresponding text file, the two will be paired, and the image identified as a proofing image. If it doesn't have a corresponding text file, it's identified as a non-page image. Either way, the text goes into the database, and the images go into the project directory with the proper permissions and ownership. This, like initial project load, is very properly handled by add_files.php.

Any round beyond P1 Unavailable, we don't allow, and don't want to allow easy bulk replacement of the text in the database.

What we want to do here is only allow squirrels (and squirrels only) to bulk replace proofing images via script, or bulk add/replace illustration (non-page) images. This involves only images. So, I think all that's needed is:

There's no way the script can or should be able to check to see if it's the correct project. It doesn't need to care whether the images are correct. The squirrel is responsible for doing that. Every image that goes in will be placed properly. If it's replacing an image, it will overwrite it whether it's a page image or a non-page image. If the image doesn't already exist in the project directory, it'll go in as a non-page image.

I have not grokked add_files.php in its entirety, so I don't know if it's still suitable for that functionality, or if the code that's already in add_files.php can be reorganized to make it work. If adding it requires a lot of new code, I would think that it might make more sense to add the functionality to images_index.php, instead. I don't know if it would touch fewer files, though.