edwardspec / mediawiki-aws-s3

Extension:AWS allows MediaWiki to use Amazon S3 (instead of the local directory) to store images.
https://www.mediawiki.org/wiki/Extension:AWS
GNU General Public License v2.0
42 stars 32 forks source link

Ingest a file uploaded by external means. #52

Closed h0m3us3r closed 2 years ago

h0m3us3r commented 2 years ago

Is there a way to "ingest" a file that already exists in the wiki's s3 bucket (by means of uploading with external tools) without reuploading said file via wiki's interface?

edwardspec commented 2 years ago

Try the following maintenance script: php maintenance/rebuildImages.php --missing

It should detect images that are already in the repository (in this case: S3 bucket) that MediaWiki doesn't know about (weren't uploaded via the interface), and should add them to the database.

h0m3us3r commented 2 years ago
  1. It seems to have done something with a new file as it printed out mwstore://AmazonS3/local-public/NEW_IMAGE.jpg on the 1st run (and not on subsequent runs), but the image still doesn't show up in Special:ListFiles. File:NEW_IMAGE.jpg still says "No file by this name exists, but you can upload it." but now also includes the "Summary (recovered file, missing upload log entry)" added by the script.
  2. It tries to import all the thumbnails but luckily fails with Error uploading file mwstore://AmazonS3/local-public/thumb/...
edwardspec commented 2 years ago

File:NEW_IMAGE.jpg still says "No file by this name exists, but you can upload it."

Sounds like cache, try adding ?action=purge to its URL.

rebuildImages.php --missing remembers the images in exactly the same way as if they were uploaded via the interface, and "Summary (recovered file)" comment is only added if this was successful.

Please check Special:Log/upload (should also have been updated on success) and whether the image is actually in the database, e.g. SELECT * FROM image WHERE img_name="NEW_IMAGE.jpg"

h0m3us3r commented 2 years ago

I have just checked again and it actually works as expected for smaller images (within max upload file size). But my main reason for uploading to the bucket directly was to avoid increasing the max upload file size (not sure if that would even help).

h0m3us3r commented 2 years ago

Changing $wgMaxUploadSize does not help.

samwilson commented 2 years ago

If it's just to get around max upload size, one thing that sometimes is useful is to upload files to the web server, and them import them from there with the importImages.php maintenance script. That way MediaWiki gets to handle them however it wants and they end up correctly in the data store and their metadata in the database without error.

edwardspec commented 2 years ago

rebuildImages.php --missing is not limited by max upload file size. There shouldn't be any difference in how small/large files are handled.

I suspect that the thumbnails for some files that you imported are not yet generated, and missing thumbnails make it look as if the image itself is "missing". Their generation is delayed via a job queue, so try running php maintenances/runJobs.php to complete this.

h0m3us3r commented 2 years ago
php maintenance/runJobs.php 
Job queue is empty.
h0m3us3r commented 2 years ago

importImages.php

Observing exactly the same behaviour: small images are uploaded correctly, large ones are missing everything except for new Summary "Importing file."

h0m3us3r commented 2 years ago

php maintenance/checkImages.php Reports that all images are Good.

edwardspec commented 2 years ago

Try to enable the following debug log (if there were errors during the upload, etc., they should appear in this file): $wgDebugLogGroups['FileOperation'] = '/path/to/filename/writeable/by/webserver';

If you enable it now, the log won't have information on files that were already imported earlier. But you can import 1 more large file and check the log afterwards.

Just to rule out a possible culprit, please confirm you are not using $wgAWSLocalCacheDirectory feature (it is disabled by default, and you don't need to enable it), as it's the only part of Extension:AWS that handles large files differently from small files,

h0m3us3r commented 2 years ago

Not using $wgAWSLocalCacheDirectory.

Here is the FileOperation log contents after running importImages.php on 2 new images, one large and one small:

wiki: S3FileBackend: found backend with S3 buckets: wiki, wiki/thumb, wiki/deleted, wiki/temp.
wiki: S3FileBackend: doPrepareInternal: S3 bucket wiki, dir=, params=dir
wiki: S3FileBackend: isSecure: checking the presence of .htsecure in S3 bucket wiki
wiki: S3FileBackend: doPrepareInternal: S3 bucket wiki, dir=archive, params=dir
wiki: S3FileBackend: doGetFileStat(): obtaining information about Large.jpg in S3 bucket wiki
wiki: FileBackendStore::ingestFreshFileStats: File mwstore://AmazonS3/local-public/Large.jpg does not exist
wiki: S3FileBackend: doCreateInternal(): saving Large.jpg in S3 bucket wiki (sha1 of the original file: 4c0hkh1p8a6ab0r5gra5v4j75six0qs, Content-Type: image/jpeg)
wiki: S3FileBackend: Performance: 0.548 second spent on: uploading Large.jpg to S3
wiki: S3FileBackend: doGetFileStat(): obtaining information about archive/20220704083949!Large.jpg in S3 bucket wiki
wiki: FileBackendStore::ingestFreshFileStats: File mwstore://AmazonS3/local-public/archive/20220704083949!Large.jpg does not exist

wiki: S3FileBackend: doPrepareInternal: S3 bucket wiki, dir=, params=dir
wiki: S3FileBackend: doPrepareInternal: S3 bucket wiki, dir=archive, params=dir
wiki: S3FileBackend: doGetFileStat(): obtaining information about Small.jpg in S3 bucket wiki
wiki: FileBackendStore::ingestFreshFileStats: File mwstore://AmazonS3/local-public/Small.jpg does not exist
wiki: S3FileBackend: doCreateInternal(): saving Small.jpg in S3 bucket wiki (sha1 of the original file: rfoghs981vexlme2b79qr3ynpp0m8k5, Content-Type: image/jpeg)
wiki: S3FileBackend: Performance: 0.013 second spent on: uploading Small.jpg to S3
wiki: S3FileBackend: doGetFileStat(): obtaining information about archive/20220704083950!Small.jpg in S3 bucket wiki
wiki: FileBackendStore::ingestFreshFileStats: File mwstore://AmazonS3/local-public/archive/20220704083950!Small.jpg does not exist
h0m3us3r commented 2 years ago

Aand, both thumbnails and pages were generated correctly after increasing $wgMaxShellMemory and $wgMaxShellFileSize to 4G (from 1G; "Large" images are about ~400M). Testing rebuildImages.php --missing now, but suspect it will work fine.

h0m3us3r commented 2 years ago

Yes, it does work as expected now. Thank you for suggesting rebuildImages.php and sorry for bothering you with a non-related as it turned out issue.