gleanerio / gleaner

Gleaner: JSON-LD and structured data on the web harvesting
https://gleaner.io
Apache License 2.0
17 stars 10 forks source link

add back in region for AWS and GCP #216

Closed fils closed 1 year ago

fils commented 1 year ago

@valentinedwv I added back in support for REGION to support GCS and AWS which I need. However, I noted the issue with Minio Auth in the comments, so I wrapped the settings so that it is not set for minio if no region is set. I was able to get this to work with Minio, Google and AWS. The setting for the minio block I used follow.

Let me know what you think and if this works OK for you.

For local minio

minio:
  address: 192.168.202.114
  port: 49153
  accessKey: KET
  secretKey: SECRET
  ssl: false
  bucket: gleaner.oih

For Google

minio:
  address: storage.googleapis.com
  port:
   accessKey: KET
  secretKey: SECRET
  ssl: true
  bucket: gleaner-oih
  region: US-CENTRAL1

For AWS

minio:
  address: s3.amazonaws.com
  port:
  accessKey: KET
  secretKey: SECRET
  ssl: true
  bucket: gleaner.oih
  region: us-east-1
valentinedwv commented 1 year ago

Found that for AWS, where the auth is not in the same region as the bucket, there might be an issue.

Passing the fully defined bucket works aka aws.us-east-1....

Will need some testing, i expect The list bucket check should probably be disabled...

fils commented 1 year ago

So I am not as familiar with AWS. Does "auth not in the same region" means the credentials are set to one region (like in the .aws/config file) but the bucket being used is in another?

valentinedwv commented 1 year ago

using the

minio:
    address: s3.amazonaws.com
    port: 443
    ssl: true
    accesskey:
    secretkey:
    bucket: ec-geocodes
    region: us-west-2

{"file":"/Users/valentin/development/dev_earthcube/gleanerio/gleaner/pkg/cli/gleaner.go:79","func":"github.com/gleanerio/gleaner/pkg/cli.initGleanerConfig","level":"fatal","msg":"cannot connect to minio: The authorization header is malformed; the region 'us-west-2' is wrong; expecting 'us-east-1' Ignore that. It's not the bucket. check config/minio: address, port, ssl. connection info: endpoint: https://s3.amazonaws.com:443 ","time":"2023-06-21T08:35:06-07:00"} with no region, auth works, but then bucket fails "file":"/Users/valentin/development/dev_earthcube/gleanerio/gleaner/internal/summoner/acquire/jsonutils.go:410","func":"github.com/gleanerio/gleaner/internal/summoner/acquire.Upload","level":"error","msg":"summoned/geocodes_demo_datasets/8c30552eca65b8807d7fa6bb5e7b8517b93bb207.jsonld: The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.","time":"2023-06-21T08:38:25-07:00"}

With this, both works:

minio:
    address: s3.us-west-2.amazonaws.com
    port: 443
    ssl: true
    accesskey: 
    secretkey: 
    bucket: ec-geocodes

as does this:

minio:
    address: s3.us-west-2.amazonaws.com
    port: 443
    ssl: true
    accesskey:
    secretkey: 
    bucket: ec-geocodes
    region: us-west-2
fils commented 1 year ago

@valentinedwv I didn't even know you could do that trick with the URL to include the region. If that is the case are you OK with this pull request if I noted this point in the https://github.com/gleanerio/gleaner/blob/master/docs/GleanerConfig.md file?

fils commented 1 year ago

@valentinedwv updated the config doc and made a note about the issue.

If that looks good let me know...