Simon-Initiative / course-digest

Tool to produce a summary or digest of OLI course package contents
MIT License
2 stars 0 forks source link

[BUGFIX] upload w/non-Url-encoded S3 key to avoid url mismatches [MER-2514] #192

Closed andersweinstein closed 1 year ago

andersweinstein commented 1 year ago

If a source filename has characters requiring URL-encoding, such as the combining tilde used in Spanish, migration tool will generate and record a suitable URL-encoded URL for it. For example, señora.jpg => sen%CC%83ora.jpgwhere CC 83 is the UTF-8 encoding of the combining tilde.

However, the Amazon S3 API apparently takes a non-URL-encoded key which may be any UTF-8 string. (Including special characters may cause trouble for client applications, but is not forbidden in an S3 key).

The upload tool was generating the key from the path part of encoded URL. The result in the above case is that the key includes literal % characters, and the URL-encoded URL needed to access the uploaded file would in fact be sen%25CC%2583ora.jpg, using %25 to match the literal percent characters in the S3 key. This was indeed the value returned from the AWS API as the data.location. So the generated URLs put in the content fail to match the resulting upload location of the file whenever characters requiring URL-encoding are part of the filename.

In shorter terms, the javascript API does not expect key values to be passed URL-encoded. This changes the code to undo URL encoding when forming the S3 key for upload from the URL.