edwardspec / mediawiki-aws-s3

Extension:AWS allows MediaWiki to use Amazon S3 (instead of the local directory) to store images.
https://www.mediawiki.org/wiki/Extension:AWS
GNU General Public License v2.0
42 stars 32 forks source link

Is usage of wgForeignFileRepos's ForeignDBRepo::class supported? #70

Closed davidhaslip closed 9 months ago

davidhaslip commented 9 months ago

I have a central image repository for my projects and use wgForeignFileRepo's ForeignAPIRepo::class currently, but the documentation states access through the DB is faster then the API so I've been trying to switch, but it's throwing various errors depending on how I configure it. The closest I've gotten it to working is "Error creating thumbnail: File missing" and the logs say something like below (the correct bucket is bahaimedia not bahaipedia):

[FileOperation] S3FileBackend: found backend with S3 buckets: bahaipedia, bahaipedia/thumb, bahaipedia/deleted, bahaipedia/temp.
[FileOperation] S3FileBackend: getFileHttpUrl(): obtaining presigned S3 URL of 3/36/Shrine_of_the_Bab.jpg in S3 bucket bahaipedia
[FileOperation] S3FileBackend: downloading presigned S3 URL https://bahaipedia.s3.amazonaws.com/3/36/Shrine_of_the_Bab.jpg?X-Amz-Content-............
[FileOperation] copy(https://bahaipedia.s3.amazonaws.com/3/36/Shrine_of_the_Bab.jpg?X-Amz-Con..........

[FileOperation] S3FileBackend: Performance: 0.502 second spent on: downloading https://bahaipedia.s3.amazonaws.com/3/36/Shrine_of_the_Bab.jpg?X-Amz-Conte........
[FileOperation] LocalCache: File /home/cache/AWScache/AmazonS3/local-public/3/36/Shrine_of_the_Bab.jpg.S3LocalCache.c26b60ea1221df9589f5a3dd7ae22b38.jpg is too small for cache: false is less than wgAWSLocalCa>
[FileOperation] S3FileBackend: doGetLocalCopyMulti: 3/36/Shrine_of_the_Bab.jpg from S3 bucket bahaipedia couldn't be copied to: [Null]

The wiki which is attempting to use ForeignDBRepo is called bahaipedia with a matching bucket name. But it should be looking in the bahaimedia bucket for the file (eg of the above file here).

These are the settings I'm using:

$wgForeignFileRepos[] = [
    'class' => 'ForeignDBRepo',
    'name' => 'local',
    'url' => "https://file.bahai.media",
    'directory' => '',
    'hashLevels' => 2, 
    'dbType' => $wgDBtype,
    'dbServer' => $wgDBserver,
    'dbUser' => "mediawikiuser",
    'dbPassword' => "mypassword",
    'dbFlags' => DBO_DEFAULT,
    'dbName' => 'bahaimedia',
    'tablePrefix' => '',
    'hasSharedCache' => true,
    'descBaseUrl' => 'https://bahai.media/File:',
    'fetchDescription' => false
];

After some more experimenting it seems like "found backend with S3 buckets: bahaipedia..." is correct but what's missing is: [FileOperation] S3FileBackend: found backend with S3 buckets: bahaimedia, bahaimedia/thumb, bahaimedia/deleted, bahaimedia/temp which appears in the logs while I'm using the API.

edwardspec commented 9 months ago

You would need to define a separate FileBackend to use a second S3 bucket. Please try the following: 1)

$wgFileBackends['s3foreign'] = [
    'name' => 'AmazonS3Foreign',
    'class' => 'AmazonS3FileBackend',
    'lockManager' => 'nullLockManager'
],

... here "foreign" is arbitrary name, can be any other string.

2) set $wgFileBackends['s3foreign']['containerPaths'] to correct bucket/paths, see the file tests/travis/OldStyleAWSSettings.php for an example.

3) Add extra parameters to the array that you are adding to $wgForeignFileRepos: 3a) 'backend' => 'AmazonS3Foreign', 3b) zones parameter, see the file tests/travis/OldStyleAWSSettings.php for an example.

davidhaslip commented 9 months ago

Progress!

[FileOperation] S3FileBackend: found backend with S3 buckets: bahaipedia, bahaipedia/thumb, bahaipedia/deleted, bahaipedia/temp.
[FileOperation] S3FileBackend: found backend with S3 buckets: bahaimedia-sp, bahaimedia-sp/thumb, bahaimedia-sp/deleted, bahaimedia-sp/temp.

But it's still failing and later in the logging:

File::transform: Doing stat for mwstore://AmazonS3Foreign/local-thumb/6/6c/Shrine_of_the_Bab_directly_below.jpg/90px-Shrine_of_the_Bab_directly_below.jpg
[objectcache] RedisBagOStuff debug: get(WANCache:filebackend:AmazonS3Foreign:pediaru:file:2703010ee926dc5e2c92b58a53465b42c8c78d89|#|v) on 127.0.0.1:6379: success
[objectcache] RedisBagOStuff debug: get(pediaru:S3FileBackend:StatCache:mwstore%3A//AmazonS3Foreign/local-thumb/6/6c/Shrine_of_the_Bab_directly_below.jpg/90px-Shrine_of_the_Bab_directly_below.jpg) on 127.0.0.1:6379: success
[FileOperation] FileBackendStore::ingestFreshFileStats: Could not stat file mwstore://AmazonS3Foreign/local-thumb/6/6c/Shrine_of_the_Bab_directly_below.jpg/90px-Shrine_of_the_Bab_directly_below.jpg
TransformationalImageHandler::doTransform: creating 90x120 thumbnail at /tmp/transform_14025ae31109.jpg using scaler im
TransformationalImageHandler::doTransform: called wfMkdirParents(/tmp)
[thumbnail] Thumbnail failed on ip-172-32-1-84: could not get local copy of "Shrine_of_the_Bab_directly_below.jpg"

Perhaps I misconfigured something?

$wgFileBackends['s3foreign'] = [
        'name' => 'AmazonS3Foreign',
        'class' => 'AmazonS3FileBackend',
        'lockManager' => 'nullLockManager'
];

$wgFileBackends['s3foreign']['containerPaths'] = [
        "bahaimedia-local-public" => "bahaimedia-sp",
        "bahaimedia-local-thumb" => "bahaimedia-sp/thumb",
        "bahaimedia-local-deleted" => "bahaimedia-sp/deleted",
        "bahaimedia-local-temp" => "bahaimedia-sp/temp"
];

$wgForeignFileRepos[] = [
        'class' => 'ForeignDBRepo',
        'name' => 'local',
        'backend' => 'AmazonS3Foreign',
        'url' => "https://file.bahai.media",
        'directory' => '',
        'hashLevels' => 2,
        'zones'             => [
                'public'  => [ 'url' => "https://bahaimedia-sp.s3.sa-east-1.amazonaws.com" ],
                'thumb'   => [ 'url' => "https://bahaimedia-sp.s3.sa-east-1.amazonaws.com/thumb" ],
                'temp'    => [ 'url' => false ],
                'deleted' => [ 'url' => false ]
        ],
        'dbType' => $wgDBtype,
        'dbServer' => $wgDBserver,
        'dbUser' => "mediawikiuser",
        'dbPassword' => "mypassword",
        'dbFlags' => DBO_DEFAULT,
        'dbName' => 'bahaimedia',
        'tablePrefix' => '',
        'hasSharedCache' => true,
        'descBaseUrl' => 'https://bahai.media/File:',
        'fetchDescription' => false
];
davidhaslip commented 9 months ago

It seems like "Could not stat file" is a problem writing the file to the web server? Does something like $wgAWSLocalCacheDirectory need to be explicitly defined also?

edwardspec commented 9 months ago

No, LocalCache works for any number of backends, no special configuration required for it.

According to the log, AmazonS3Foreign backend doesn't seem to be used at all (it only gets initialized, but is not called by this $wgForeignFileRepos setup). Theoretically it should be, ForeignDBRepo is a subclass of FileRepo, which recognized/uses backend parameter...

davidhaslip commented 9 months ago

This has been resolved, $wgFileBackends needed wikiId to be set also (possibly because I use $wgConf?). After setting that it still failed in my configuration because the presigned url was passing the AWS region, and on the server I was testing I use different regions for the content wiki and the shared repository wiki. The content wiki has a bucket in us-east-1 because it doesn't hold many files, but for media the region is set to sa-east-1 because that bucket is replicated in different regions of the world. After changing the default to sa-east-1 everything worked as expected.

I guess the question is, in this configuration why is wgForeignFileRepos using wgAWSRegion for the content wiki and not the media wiki? Or can wgForeignFileRepos be told about the correct region also?

edwardspec commented 9 months ago

Or can wgForeignFileRepos be told about the correct region also?

Yes, you can override the region with

$wgFileBackends['s3foreign']['awsRegion'] = 'sa-east-1';
davidhaslip commented 9 months ago

Awesome, I really appreciate your help, everything is working great!

davidhaslip commented 9 months ago

FYI the above configuration 'name' => 'local' causes fetchDescription to be ignored so the correct settings for my wiki were:

        $wgFileBackends['s3foreign'] = [
                'name'           => 'AmazonS3Foreign',
                'class'          => 'AmazonS3FileBackend',
                'wikiId'         => 'bahaimedia',
                'lockManager'    => 'nullLockManager',
                'containerPaths' => [
                        "bahaimedia-medialocal-public" => "bahaimedia",
                        "bahaimedia-medialocal-thumb" => "bahaimedia/thumb",
                        "bahaimedia-medialocal-deleted" => "bahaimedia/deleted",
                        "bahaimedia-medialocal-temp" => "bahaimedia/temp",
                ]
        ];
        $wgForeignFileRepos[] = [
                'class' => ForeignDBRepo::class,
                'name' => 'medialocal',
                'backend' => 'AmazonS3Foreign',
                'url' => "https://file.bahai.media",
                'directory' => '',
                'hashLevels' => 2,
                'dbType' => $wgDBtype,
                'dbServer' => $wgDBserver,
                'dbUser' => "mediawiki",
                'dbPassword' => "mypassword",
                'dbFlags' => DBO_DEFAULT,
                'dbName' => 'bahaimedia',
                'tablePrefix' => '',
                'hasSharedCache' => true,
                'initialCapital' => true,
                'zones' => [
                        'public' =>  [ 'url' => "https://file.bahai.media" ],
                        'thumb' =>  [ 'url' => "https://file.bahai.media/thumb" ],
                        'temp' => [ 'url' => false ],
                        'deleted' => [ 'url' => false ],
                ],
                'descBaseUrl' => 'https://bahai.media/File:',
                'fetchDescription' => true
        ];

with 'awsRegion' => 'eu-central-1', being added to wgFileBackends['s3foreign'] for regions outside the U.S. Unfortunately there was some problem with TimedMediaHandler where audio/video files were failing with a "missing source" statement which I was never able to resolve.