CDLUC3 / mrt-doc

Documentation and Information regarding the Merritt repository
8 stars 4 forks source link

Presigned URL issue associated with handling of `+` character in filename #589

Closed elopatin-uc3 closed 3 years ago

elopatin-uc3 commented 3 years ago

This is a sister ticket to Dryad #1116.

Regarding presigned file URL: http://merritt.cdlib.org/api/presign-file/ark%3A%2F13030%2Fm5kq31dt/1/producer%2FTDR%20Acc%2BH2O.zip?no_redirect=true Per @sfisher: There are a number of presigned URL problems from Merritt. It seems like most of these contain the character + which is correctly encoded as %2B in our request URLs like above. I believe Merritt is prematurely decoding %2B as a plus sign (which is a space in some query strings). However, %2B should be passed through as a character to the script that operates on the filenames, not decoded in an earlier layer and interpreted as a space.

See for example https://www.w3schools.com/tags/ref_urlencode.ASP or https://tools.ietf.org/html/rfc3986 or many encoding/decoding libraries in languages such as Java or Ruby.

The above issue is believed to be a new one that surfaced after the recent Rails5 upgrade.

PS. It seems to be present in the Merritt UI, also. https://merritt.cdlib.org/m/ark%253A%252F13030%252Fm5kq31dt/1 and try to download TDR Acc+H2O.zip

mreyescdl commented 3 years ago

Tracing the request through Apache, UI and DB query shows that the "+" sign is present in the UI, but not used in the DB query. This results in a 404 response. The UI seems to convert the "+" sign to whitespace prior to querying the DB.

Apache
------
/api/presign-file/ark%3A%2F13030%2Fm5bs499p/6/producer%2FBaltic%2B6Regions_genera_sizes.xlsx?no_redirect=true

UI
--
/api/presign-file/ark:%2F13030%2Fm5bs499p/6/producer%2FBaltic+6Regions_genera_sizes.xlsx?no_redirect=true

DB Query
--------
SELECT  `inv_files`.* FROM `inv_files` INNER JOIN `inv_versions` ON `inv_versions`.`id` = `inv_files`.`inv_version_id` INNER JOIN `inv_objects` ON `inv_objects`.`id` = `inv_files`.`inv_object_id` WHERE (inv_objects.ark = 'ark:/13030/m5bs499p') AND (inv_versions.number = '6') AND (inv_files.pathname = 'producer/Baltic 6Regions_genera_sizes.xlsx') ORDER BY `inv_files`.`id` ASC LIMIT 1;
terrywbrady commented 3 years ago

Stage

bash-4.2$ docker-compose run --rm -e INTEG_TEST_ENV=stage -e INGEST_FILES=plus mrt-integ-tests
Creating mrt-integ-tests_mrt-integ-tests_run ... done
stage: 2021_02_19_1706

basic_merrit_ui_tests
  View home page - Merritt Landing Page
  Get version from footer
        ==> v1.0.7dev2
    Print footer
  Check storage service state
...
  Check for valid storage nodes
  Unauthenticated Access
    Perform Merritt Guest Login
    Open guest collections
    Browse to first object
    Browse to first version
    Browse to first file
    Browse to system text file and validate presigned url
    Guest collections - no collection access
  Authenticated access
    Authenticated - file presigned download
    ingest files
      Ingest zip file with encoding use cases
      ingest file with key space
        Ingest README 1.md
      ingest file with key plus
        Ingest README+2.md
   --> sleep 80 (to allow ingests to complete)
    browse objects/files
   --> sleep 30 (to allow assembly to complete)
   --> sleep 15 (to allow download to complete)
      Test object download
      search for file on version page: README 1.md
        Test file link from version page: README 1.md
      search for file on version page: README+2.md
        Test file link from version page: README+2.md
      search for object with 2021_02_19_1706_space
        Search for recently ingested object's local id: 2021_02_19_1706_space
        Search for test file on object page: README 1.md
        Search for test file on object version page: README 1.md
   --> sleep 30 (to allow assembly to complete)
   --> sleep 15 (to allow download to complete)
        Start download object for recently ingested object: space
      search for object with 2021_02_19_1706_plus
        Search for recently ingested object's local id: 2021_02_19_1706_plus
        Search for test file on object page: README+2.md
        Search for test file on object version page: README+2.md
   --> sleep 30 (to allow assembly to complete)
   --> sleep 15 (to allow download to complete)
        Start download object for recently ingested object: plus

Finished in 6 minutes 25 seconds (files took 4.14 seconds to load)
25 examples, 0 failures

Prod

bash-4.2$ docker-compose run --rm -e INTEG_TEST_ENV=production -e INGEST_FILES=plus mrt-integ-tests
Creating mrt-integ-tests_mrt-integ-tests_run ... done
production: 2021_02_19_1719

basic_merrit_ui_tests
  View home page - Merritt Landing Page
  Get version from footer
        ==> v1.0.5
    Print footer
  Check storage service state
...
  Check for valid storage nodes
  Unauthenticated Access
    Perform Merritt Guest Login
    Open guest collections
    Browse to first object
    Browse to first version
    Browse to first file
    Browse to system text file and validate presigned url
    Guest collections - no collection access
  Authenticated access
    Authenticated - file presigned download
    ingest files
      Ingest zip file with encoding use cases
      ingest file with key space
        Ingest README 1.md
      ingest file with key plus
        Ingest README+2.md
   --> sleep 80 (to allow ingests to complete)
    browse objects/files
   --> sleep 30 (to allow assembly to complete)
   --> sleep 15 (to allow download to complete)
      Test object download
      search for file on version page: README 1.md
        Test file link from version page: README 1.md
      search for file on version page: README+2.md
The page you were looking for doesn't exist.
        Test file link from version page: README+2.md (FAILED - 1)
      search for object with 2021_02_19_1719_space
        Search for recently ingested object's local id: 2021_02_19_1719_space
        Search for test file on object page: README 1.md
        Search for test file on object version page: README 1.md
   --> sleep 30 (to allow assembly to complete)
   --> sleep 15 (to allow download to complete)
        Start download object for recently ingested object: space
      search for object with 2021_02_19_1719_plus
        Search for recently ingested object's local id: 2021_02_19_1719_plus
The page you were looking for doesn't exist.
        Search for test file on object page: README+2.md (FAILED - 2)
The page you were looking for doesn't exist.
        Search for test file on object version page: README+2.md (FAILED - 3)
   --> sleep 30 (to allow assembly to complete)
   --> sleep 15 (to allow download to complete)
        Start download object for recently ingested object: plus

...

Finished in 7 minutes 17 seconds (files took 3.87 seconds to load)
25 examples, 3 failures