CDRH / api

Codenamed "Apium": An API to access all public Center for Digital Research in the Humanities resources
https://cdrhdev1.unl.edu/api_frontend
MIT License
3 stars 1 forks source link

Multi-line values as form params aren't handled properly #81

Closed techgique closed 6 years ago

techgique commented 6 years ago

Values indexed which have newline characters don't match if passed as form parameters, such as when stored in hidden inputs after being selected for faceting. They only work when searched via URI-encoded URLs such as for faceting.

Orchid's query string and date filter forms have hidden inputs for the other params currently in use and a multi-line value (e.g. Jaffery placeName) results in HTML like below:

                <!-- include existing parameters-->
   <input type="hidden" name="f[]" id="f_" value="places_written_k|Jaffrey, New Hampshire, United
                           States" />

The parameter is then passed as &f[]=places_written_k|Jaffrey%2C+New+Hampshire%2C+United%0D%0A++++++++++++++++++++++++++++States When selected as a facet, it has all of those characters except %0D which is a CR (carriage return) character. We only want the %0A to match on the search sent to the API.

I tried URI-escaping the value, which really doesn't work as it double-encodes everything in the params. I tried removing newlines (s/\n//g). That removes both the %0D and the %0A, so it doesn't match when the search is sent to the API. I tried removing CRs \r, LFs \f, and vertical space \v. These don't remove any characters, so it still sends %0D to the API and doesn't match. Lastly, s/[^a-zA-Z0-9_| \n]//g didn't get rid of the CR either.

jduss4 commented 6 years ago

Am I correct in understanding that the facets are being stored in the API with multi-lines? Are there ever cases when we want that to happen? I think it's a good idea to handle it on this side of things, too, but ideally I can't think of many cases where facets would have multiple lines, so we should probably be altering the import from the data repository as well...

techgique commented 6 years ago

Totally agree they should be handled in the imports too. I can't think of any cases where we'd want the vertical space to be retained, but it's also hard to predict every field (especially project-specific ones) that will be used as facets to know when to strip out vertical space. I would lean towards always stripping out vertical space and if we run into an edge case later, address it then. If it does occur, it hopefully won't require any monumental upheaval of existing API contents etc.