Closed elopatin-uc3 closed 3 years ago
Tracing the request through Apache, UI and DB query shows that the "+" sign is present in the UI, but not used in the DB query. This results in a 404 response. The UI seems to convert the "+" sign to whitespace prior to querying the DB.
Apache
------
/api/presign-file/ark%3A%2F13030%2Fm5bs499p/6/producer%2FBaltic%2B6Regions_genera_sizes.xlsx?no_redirect=true
UI
--
/api/presign-file/ark:%2F13030%2Fm5bs499p/6/producer%2FBaltic+6Regions_genera_sizes.xlsx?no_redirect=true
DB Query
--------
SELECT `inv_files`.* FROM `inv_files` INNER JOIN `inv_versions` ON `inv_versions`.`id` = `inv_files`.`inv_version_id` INNER JOIN `inv_objects` ON `inv_objects`.`id` = `inv_files`.`inv_object_id` WHERE (inv_objects.ark = 'ark:/13030/m5bs499p') AND (inv_versions.number = '6') AND (inv_files.pathname = 'producer/Baltic 6Regions_genera_sizes.xlsx') ORDER BY `inv_files`.`id` ASC LIMIT 1;
bash-4.2$ docker-compose run --rm -e INTEG_TEST_ENV=stage -e INGEST_FILES=plus mrt-integ-tests
Creating mrt-integ-tests_mrt-integ-tests_run ... done
stage: 2021_02_19_1706
basic_merrit_ui_tests
View home page - Merritt Landing Page
Get version from footer
==> v1.0.7dev2
Print footer
Check storage service state
...
Check for valid storage nodes
Unauthenticated Access
Perform Merritt Guest Login
Open guest collections
Browse to first object
Browse to first version
Browse to first file
Browse to system text file and validate presigned url
Guest collections - no collection access
Authenticated access
Authenticated - file presigned download
ingest files
Ingest zip file with encoding use cases
ingest file with key space
Ingest README 1.md
ingest file with key plus
Ingest README+2.md
--> sleep 80 (to allow ingests to complete)
browse objects/files
--> sleep 30 (to allow assembly to complete)
--> sleep 15 (to allow download to complete)
Test object download
search for file on version page: README 1.md
Test file link from version page: README 1.md
search for file on version page: README+2.md
Test file link from version page: README+2.md
search for object with 2021_02_19_1706_space
Search for recently ingested object's local id: 2021_02_19_1706_space
Search for test file on object page: README 1.md
Search for test file on object version page: README 1.md
--> sleep 30 (to allow assembly to complete)
--> sleep 15 (to allow download to complete)
Start download object for recently ingested object: space
search for object with 2021_02_19_1706_plus
Search for recently ingested object's local id: 2021_02_19_1706_plus
Search for test file on object page: README+2.md
Search for test file on object version page: README+2.md
--> sleep 30 (to allow assembly to complete)
--> sleep 15 (to allow download to complete)
Start download object for recently ingested object: plus
Finished in 6 minutes 25 seconds (files took 4.14 seconds to load)
25 examples, 0 failures
bash-4.2$ docker-compose run --rm -e INTEG_TEST_ENV=production -e INGEST_FILES=plus mrt-integ-tests
Creating mrt-integ-tests_mrt-integ-tests_run ... done
production: 2021_02_19_1719
basic_merrit_ui_tests
View home page - Merritt Landing Page
Get version from footer
==> v1.0.5
Print footer
Check storage service state
...
Check for valid storage nodes
Unauthenticated Access
Perform Merritt Guest Login
Open guest collections
Browse to first object
Browse to first version
Browse to first file
Browse to system text file and validate presigned url
Guest collections - no collection access
Authenticated access
Authenticated - file presigned download
ingest files
Ingest zip file with encoding use cases
ingest file with key space
Ingest README 1.md
ingest file with key plus
Ingest README+2.md
--> sleep 80 (to allow ingests to complete)
browse objects/files
--> sleep 30 (to allow assembly to complete)
--> sleep 15 (to allow download to complete)
Test object download
search for file on version page: README 1.md
Test file link from version page: README 1.md
search for file on version page: README+2.md
The page you were looking for doesn't exist.
Test file link from version page: README+2.md (FAILED - 1)
search for object with 2021_02_19_1719_space
Search for recently ingested object's local id: 2021_02_19_1719_space
Search for test file on object page: README 1.md
Search for test file on object version page: README 1.md
--> sleep 30 (to allow assembly to complete)
--> sleep 15 (to allow download to complete)
Start download object for recently ingested object: space
search for object with 2021_02_19_1719_plus
Search for recently ingested object's local id: 2021_02_19_1719_plus
The page you were looking for doesn't exist.
Search for test file on object page: README+2.md (FAILED - 2)
The page you were looking for doesn't exist.
Search for test file on object version page: README+2.md (FAILED - 3)
--> sleep 30 (to allow assembly to complete)
--> sleep 15 (to allow download to complete)
Start download object for recently ingested object: plus
...
Finished in 7 minutes 17 seconds (files took 3.87 seconds to load)
25 examples, 3 failures
This is a sister ticket to Dryad #1116.
Regarding presigned file URL: http://merritt.cdlib.org/api/presign-file/ark%3A%2F13030%2Fm5kq31dt/1/producer%2FTDR%20Acc%2BH2O.zip?no_redirect=true Per @sfisher: There are a number of presigned URL problems from Merritt. It seems like most of these contain the character
+
which is correctly encoded as%2B
in our request URLs like above. I believe Merritt is prematurely decoding %2B as a plus sign (which is a space in some query strings). However, %2B should be passed through as a character to the script that operates on the filenames, not decoded in an earlier layer and interpreted as a space.See for example https://www.w3schools.com/tags/ref_urlencode.ASP or https://tools.ietf.org/html/rfc3986 or many encoding/decoding libraries in languages such as Java or Ruby.
The above issue is believed to be a new one that surfaced after the recent Rails5 upgrade.
PS. It seems to be present in the Merritt UI, also. https://merritt.cdlib.org/m/ark%253A%252F13030%252Fm5kq31dt/1 and try to download
TDR Acc+H2O.zip