aws-solutions / serverless-image-handler

A solution to dynamically handle images on the fly, utilizing SharpJS
Apache License 2.0
1.32k stars 533 forks source link

Semantic File URLs #184

Open bs-thomas opened 4 years ago

bs-thomas commented 4 years ago

After switching over from Thumbor to Sharp, the URLs become hashed jibberish.

I was wondering if there is (or will be) an option to have a more semantic URL for these files?

The reason I'm asking is really for having meaningful filenames, and hopefully a little advantage on SEO as well.

https://webmasters.stackexchange.com/questions/102419/converting-default-normal-filenames-to-a-random-string-of-text-whats-better

Just for your reference, IMGIX makes use of query strings to achieve a similar effect. Something like this: https://www.example.com/some-file-name.jpg?w=1903&h=933&fit=crop

with the optional "signature" query string appended to the end like this to avoid spam abuse: https://www.example.com/some-file-name.jpg?w=1903&h=933&fit=crop&s=V5kYnb7bdDaCfgeB

Your help is appreciated!

Thank you very much!

beomseoklee commented 4 years ago

Thanks for your opinion, @bs-thomas The solution is still supporting some Thumbor style URLs. You can refer to the implementation guide Appendix D: list of Supported Thumbor Filters.

Would it be enough to answer your question or you actually need a new feature?

bs-thomas commented 4 years ago

Thanks for your prompt response @beomseoklee .

I understand that it supports the legacy Thumbor style URLs, but reading some previous issues it doesn't seem to be quite fully supported, and also thumbor's API and Sharp's API and methods seem very different. In the long run in terms of sustainability, I feel that this structure is going to face some limitations and other unexpected issues.

I noticed that this project has switched from Thumbor to Sharp, and I could imagine that when there is a better library in the future, the default would be switched over to that new library. That is a good move actually, and I fully support this, because packages come and go, and when there's a better vendor, we should use the better one.

However, what I thought is that we should keep the interface flexible, and consistent, regardless of which image resizing package we use behind. That will minimize breaking changes caused to users of this package. Does that sound reasonable to you?

My Recommendation

After giving a deeper thought, my recommendation is actually to not copy IMGIX's interface, because it does have some limitations.

For example, consider if I have two operations: (a) Rotate 45 degrees, (b) Flip horizontally, on Imgix, the query string would be "?flip=h&rot=45.0"

But note that depending on the sequence, there could be two different outputs: (sequence 1) if you perform (a) first, then (b) (sequence 2) if you perform (b) first, then (a)

So my recommendation is to use the following format (or any format with similar logic): "?m[]=flip(h)&m[]=rot(45)"

How does the above recommendation sound? And what do you think of this, in general?

=====

Also, I forgot to thank you and the team last comment, for making such a beautiful system for resizing images on-the-fly. It's really a cost savings approach. To be frank, I'm paying hundreds of dollars monthly just for IMGIX right now, just for resizing images. Sounds ridiculous, and that's why I'm looking for alternatives like this one now.

Thank you very much!!

ahardin commented 4 years ago

I would like to see this supported as well. Another reason is to allow CloudFront wildcard signed cookies.

In my scenario, all images are private. I would like to be able to generate a signed cookie for my users that grants access to the handler at a path like /companyABC/*. If I understand correctly, the only other 2 options besides signed cookies would be signed URLs (not practical when getting a list of thousands or even hundreds of images), or a Lambda@Edge authorizer. The Lambda@Edge would probably work well, but it's another moving part.

I'll check out the legacy Thumbor URLs as a stop gap.

TomasBelskis commented 4 years ago

This is something that we would be interested in also, as we currently are forced to keep using version 3.1 as it still supports Semantic File URLs for the serverless image handler function.

I can see that version 4. supports some of the Thumbor style URLs, but this doesn't guarantee the longevity of it? It would be just as much of a risk to migrate to 4. and have the support removed in later versions of the image handler.

Can you guarantee that the Thumbor style URLs, will be supported for further version of the image handler? If you cannot then can this be a feature, as the base64 encoded URLs just don't make sense for our use case.

ahardin commented 4 years ago

I just switched over to using the legacy Thumbor format and they work fine and play nicely with signed cookies. But, I only need to do resizing, so very limited use case. And it's also worrisome that those could (and probably should) be pulled out in a future version.

@bs-thomas's suggestion of abstracting the URL format away from the underlying processing library is spot on. That's certainly the "right" software design choice (if there is such a thing).

Furytron commented 4 years ago

The changing URL based on the underlying library and the hash in the URL is definitely an issue, especially for semantics and SEO. I would agree with @bs-thomas 's suggestion of abstracting the URL format from whatever library is being used. Keeping the Thumbor style URL from version 3.1 and allowing it in version 4.* could solve this problem. However, S3 bucket subfolders are not supported in version 4 from what I see in the docs. That would also need to be fixed if that style was used.

beomseoklee commented 4 years ago

@Furytron actually the latest version, v4.2, supports image files in subfolder as a part of PR #130 . The document needs to be updated. I will ask to update the document.

Furytron commented 4 years ago

Thanks @beomseoklee thats good news. Is there plans for long term support of Thumbor? Or at least the URLs?

beomseoklee commented 4 years ago

@Furytron we are going to support Thumbor style URLs as many of users actually use Thumbor style URLs.

But for the original topic, we will think about what would be the best way to support.

bs-thomas commented 4 years ago

I also wanted to mention that the current "hashed" URLs are creating issues for my Javascript components.

My javascript component is a "file uploader", and after uploading, it will read the "extension" of the URL, and depending on the extension, decides if the uploaded DIV should display a thumbnail image (eg. PNG, JPG, GIF), or not (eg. PDF, DOCX, ZIP). Without the extension, it is unable to determined this.

I have also another advice regarding on Thumbor URLs. While I think support of this should be continued (because many users are still using it), I would like to bring up the point that it actually modifies the "original URL" by changing the "path to the file".

I feel that mutations /variations of the image are supposedly "still the same image", and is only a "variation of the same image", so I personally feel that the URL should not be modified. "Query strings" is a very good tool to define these variations / mutations, and thus my original suggestion as follows:

"https://example.com/some/path/to/s3/example-file.jpg?m[]=flip(h)&m[]=rot(45)"

Parsing the above URL requires close to zero effort. Any tool or programming language can extract the query string away with some native function (in PHP we have parse_url()), and there you go you immediately get the actual file URL location. No complicated logic required.

Query string parameter "m" (standards for mutation) is an "array", and should be processed sequentially from first to last. This allows the flexibility of applying the "same function" twice (like how Sharp allows you to "chain methods").

Then we can legitimately perform the following operation: "https://example.com/some/path/to/s3/example-file.jpg?m[]=rot(90)&m[]=rot(90)&m[]=rot(-180)"

... and we get the original image back ;-)

beomseoklee commented 4 years ago

Thanks for you guys comments all. I've created a beta version which can include query strings on URL.

The supported filters are same as legacy Thumbor style, so you can refer to the Serverless Image Handler documentation.

Here is the example usage:

Since Amazon API Gateway supports multiple query string parameters that have the same name, every filter query string would be adapted. See the Amazon API Gateway documentation regarding multiple query string parameters on Amazon API Gateway.

As this is not the production version (experimental version), I would recommend you to test this version, and if it looks good to you, you might be able to use this one on your production environment. If not, please don't use this one on your production environment.

Please let us know if you have any feedback so that we can hopefully add this feature in the next update.

bs-thomas commented 4 years ago

@beomseoklee Thank you very much for your prompt implementation of a beta. Really appreciate it!

To be frank I haven't yet had a chance to try it out. But I do notice one thing that may not be able to address. (Or it could be that I have misunderstood something?)

Consider the following example: "https://example.com/some/path/to/s3/example-file.jpg?m[]=flip(h)&m[]=rot(45)"

As you see, if I do a "flip horizontally" first, then do "rotate 45 degrees", the end result is different than if I do "rotate 45 degrees" first, and then do "flip horizontally".

So the "sequence" of these transformations do matter.

How would the new implementation address this issue, and how does it prioritize the transformations?

Thank you very much!

beomseoklee commented 4 years ago

@bs-thomas I haven't tried the order of filters (Thumbor flipping seems available with -0x-0 size, but we don't have this feature), but I'm sure that edits JSON object will have keys with query strings order, so what you are asking would be fine.

bs-thomas commented 4 years ago

@beomseoklee Sorry maybe I wasn't so clear on my question.

The original topic in this issue was related to have some sort of "common interface" (regardless if it were Thumbor or Sharp) so it could be used as an abstracted interface for this image handler.

You have came up with the new beta implementation, which uses this format: https://example.com/some/path/to/s3/example-file.jpg?width=300&height=300&filter:grayscale()&filter:rotate(90)

But what I mean is, the above format does not present any sequence (which it could yield different results if the filters were different from the example you gave).

=====

Your Example, you have:

  1. width=300
  2. height=300
  3. filter:grayscale()
  4. filter:rotate(90)

If we flip them around, yields same result:

  1. width=300
  2. height=300
  3. filter:rotate(90)
  4. filter:grayscale()

=====

However, if we modify your Example slightly, so "filter:grayscale()" becomes "filter:flip(h)"

  1. width=300
  2. height=300
  3. filter:flip(h)
  4. filter:rotate(90)

If we flip them around, yields a different result:

  1. width=300
  2. height=300
  3. filter:rotate(90)
  4. filter:flip(h)

=====

Another flawed example is the repetition of some filters. Try changing flip(h) to filter:rotate(90)

  1. width=300
  2. height=300
  3. filter:rotate(90)
  4. filter:rotate(90)

The result we want is the image rotated by 180 degrees.

However, he above example would yield the following URL, with two equivalent keys: https://example.com/some/path/to/s3/example-file.jpg?width=300&height=300&filter:rotate(90)&filter:rotate(90)

The URL above will only capture "filter:rotate(90)" once, which yields the resulting image to be rotated by 90 degrees only, and not 180 degrees.

=====

So, what I'm trying to say is, can the interface you have designed, be upgraded such that this flexibility could be put into consideration?

(Previously, I had a recommendation for you, by using query string arrays ?m[]=filterA&m[]=filterB....

I'm sure there are other ways, but just wanted this to be addressed, together along with the solution obeying the "semantic URL" rules).

Thank you very much!

beomseoklee commented 4 years ago

@bs-thomas Thanks for your feedback again. Currently, every filter will be added in edits, and if you provide same filter twice, it will overwrite the first one. For example, https://example.com/some/path/to/s3/example-file.jpg?width=300&height=300&filter:rotate(90)&filter:rotate(90) would return an image rotated by only 90 degrees.

I will look at the Thumbor filters more precisely so that our "technically supported" Thumbor URL could do exactly same with Thumbor.

amcfarlane commented 4 years ago

I think even a way to 'escape' the hash so you can add your own filename would help here:

https://distributionName.cloudfront.net/base64encodedrequest >

https://distributionName.cloudfront.net/base64encodedrequest/fileName.jpg

bs-thomas commented 4 years ago

Thanks for you guys comments all. I've created a beta version which can include query strings on URL.

The supported filters are same as legacy Thumbor style, so you can refer to the Serverless Image Handler documentation.

Here is the example usage:

Since Amazon API Gateway supports multiple query string parameters that have the same name, every filter query string would be adapted. See the Amazon API Gateway documentation regarding multiple query string parameters on Amazon API Gateway.

As this is not the production version (experimental version), I would recommend you to test this version, and if it looks good to you, you might be able to use this one on your production environment. If not, please don't use this one on your production environment.

Please let us know if you have any feedback so that we can hopefully add this feature in the next update.

@beomseoklee

Once again really thanks for spending the time responding and covering my issue. I've had a chance to try your new beta, but for some reason, it seems like the "rotate" and "grayscale" filters are not working here. Please see this URL:

https://d1spq9esmard59.cloudfront.net/clients/codebase.beam.style-local/public-default/some-relative-directory-to-public-default/test.jpg?width=300&height=300&filter:grayscale()&filter:rotate(90)

Not sure what's going wrong there. Would you have any ideas?

beomseoklee commented 4 years ago

@bs-thomas That was my mistake. It should be like this.

https://d1spq9esmard59.cloudfront.net/clients/codebase.beam.style-local/public-default/some-relative-directory-to-public-default/test.jpg?width=300&height=300&filter=grayscale()&filter=rotate(90)

bs-thomas commented 4 years ago

@beomseoklee Thank you very much! That seems to work.

Btw, is it really not possible to make the "filter=" "filter[0]=" and "filter[1]="instead?

(actually, supposedly should also work when writing "filter[]=")

I use PHP, and because the query string is invalid (containing same key twice), calling this native PHP function http_build_query() to convert from array to query string returns the wrong output:

        http_build_query([
            'width' => '300',
            'height' => '300',
            'filter' => 'grayscale()',
            'filter' => 'rotate(90)',
        ]);

width=300&height=300&filter=rotate%2890%29

which is wrong, because it is missing "grayscale()"

Right now, I'm forced to use some manual method to convert from array to query string, but that is not robust, and is deemed to have unexpected problems later.

Would really appreciate you could make this change for me! Thank you!

jcn commented 4 years ago

@beomseoklee With this new style of URL, would it then be possible to support multiple buckets while building the filters in the URL. Perhaps by passing a bucket parameter, if it's not possible to parse the bucket name from the path itself?

github-actions[bot] commented 2 years ago

This issue has not received a response in a while. If you want to keep this issue open, please leave a comment below and auto-close will be canceled.

github-actions[bot] commented 2 years ago

This issue was closed because it has been inactive for 7 days since being marked as closing-soon.

haxiomic commented 2 years ago

Just reaching out to add my support for the design implemented in the beta by @beomseoklee, it's perfect for our usecase

Is v4.2.1-beta.2 the only version with this feature or is it available in later versions?

javierlinked commented 8 months ago

@beomseoklee I have the same question, shame I cannot find this code anywhere. @haxiomic have you advanced on this since 2022?

simonkrol commented 8 months ago

Hi @javierlinked, I've regenerated the template and uploaded it to the same link as in Beomseok's comment. I've made a couple changes to get the deployment functional (including a lambda runtime bump), but I haven't verified functionality. You'll likely need to download the Lambda package and fix any issues created by modifying the runtime.

In the meantime, I've reopened this issue, and we'll re-evaluate if this is something that makes sense to add to SIH.

javierlinked commented 8 months ago

Thanks @simonkrol, @beomseoklee. It would be great to have this feature as part of the core package, instead of a customisation, also CloudFront needs to be configured to whitelist certain query string parameters, etc.

yahiya-ayoub commented 5 months ago

Hey @beomseoklee I tried to parse the query string as you mentioned

Thanks for you guys comments all. I've created a beta version which can include query strings on URL.

The supported filters are same as legacy Thumbor style, so you can refer to the Serverless Image Handler documentation.

Here is the example usage:

Since Amazon API Gateway supports multiple query string parameters that have the same name, every filter query string would be adapted. See the Amazon API Gateway documentation regarding multiple query string parameters on Amazon API Gateway.

As this is not the production version (experimental version), I would recommend you to test this version, and if it looks good to you, you might be able to use this one on your production environment. If not, please don't use this one on your production environment.

Please let us know if you have any feedback so that we can hopefully add this feature in the next update.

Hey @beomseoklee I tried to parse the query string as you mentioned, but it is not working for me, I installed the latest version of the template, is this feature didn't get merged to production?

yahiya-ayoub commented 5 months ago

I tracked the request through Lambda logs and I got this log which appears that the query string is being parsed from the CloudFront after updating its origin request policy.

My request is like the following: https://.cloudfront.net/web/images/636447403654622319.jpg?width=600&height=600&fit-in

And here is the lambda logs: 2024-05-20T21:59:39.843Z 2f758119-8101-424b-979c-221e95803b34 INFO Received event: { "resource": "/{proxy+}", "path": "/web/images/636447403654622319.jpg", "httpMethod": "GET", "headers": { "Accept": "/", "Host": "flrin2a29i.execute-api.eu-west-2.amazonaws.com", "User-Agent": "Amazon CloudFront", "Via": "1.1 13b0de485c7b13f6889ba5a1aa346de0.cloudfront.net (CloudFront)", "X-Amz-Cf-Id": "d66OMmsAQtmvvMVWjtnayEpdxleZ2KFgy-LN2xzsUo1LkOTNCGjdTA==", "X-Amzn-Trace-Id": "Root=1-664bc7cb-3d30cbea4b58d09e294f159f", "X-Forwarded-For": "2.49.46.42, 64.252.86.31", "X-Forwarded-Port": "443", "X-Forwarded-Proto": "https" }, "multiValueHeaders": { "Accept": [ "/" ], "Host": [ "flrin2a29i.execute-api.eu-west-2.amazonaws.com" ], "User-Agent": [ "Amazon CloudFront" ], "Via": [ "1.1 13b0de485c7b13f6889ba5a1aa346de0.cloudfront.net (CloudFront)" ], "X-Amz-Cf-Id": [ "d66OMmsAQtmvvMVWjtnayEpdxleZ2KFgy-LN2xzsUo1LkOTNCGjdTA==" ], "X-Amzn-Trace-Id": [ "Root=1-664bc7cb-3d30cbea4b58d09e294f159f" ], "X-Forwarded-For": [ "2.49.46.42, 64.252.86.31" ], "X-Forwarded-Port": [ "443" ], "X-Forwarded-Proto": [ "https" ] }, "queryStringParameters": { "fit-in": "", "height": "600", "width": "600" }, "multiValueQueryStringParameters": { "fit-in": [ "" ], "height": [ "600" ], "width": [ "600" ] }, "pathParameters": { "proxy": "web/images/636447403654622319.jpg" }, "stageVariables": null, "requestContext": { "resourceId": "v0k6br", "resourcePath": "/{proxy+}", "httpMethod": "GET", "extendedRequestId": "YFwn5F6hLPEEGkg=", "requestTime": "20/May/2024:21:59:39 +0000", "path": "/image/web/images/636447403654622319.jpg", "accountId": "378659857730", "protocol": "HTTP/1.1", "stage": "image", "domainPrefix": "flrin2a29i", "requestTimeEpoch": 1716242379804, "requestId": "1ab41033-62a8-4f6b-85cc-3f226e2912ff", "identity": { "cognitoIdentityPoolId": null, "accountId": null, "cognitoIdentityId": null, "caller": null, "sourceIp": "2.49.46.42", "principalOrgId": null, "accessKey": null, "cognitoAuthenticationType": null, "cognitoAuthenticationProvider": null, "userArn": null, "userAgent": "Amazon CloudFront", "user": null }, "domainName": "flrin2a29i.execute-api.eu-west-2.amazonaws.com", "deploymentId": "cdx9tt", "apiId": "flrin2a29i" }, "body": null, "isBase64Encoded": false

It seems to handle the width and height parameters but the fit-in I think it is parsed incorrectly. Can anyone please help me here?

simonkrol commented 5 months ago

Hi @yahiya-ayoub, Can you confirm that you used the template from Beomseok's comment? While we're still evaluating this change, it isn't something we've decided to add yet and wouldn't be present in the latest template. As the generated template is a beta version that we don't currently plan to implement, we may not be able to provide support for issues that arise.

Thank you, Simon

yahiya-ayoub commented 5 months ago

Hey @simonkrol , thank you for replying Yes I tried to deploy the template but I faced this issue in the cloudformation creation task Resource handler returned message: "Your access has been denied by S3, please make sure your request credentials have permission to GetObject for solutions-features-eu-west-2/serverless-image-handler/v4.2.1-beta.2/image-handler.zip. S3 Error Code: AccessDenied. S3 Error Message: Access Denied (Service: Lambda, Status Code: 403,