CDRH / api

Codenamed "Apium": An API to access all public Center for Digital Research in the Humanities resources
https://cdrhdev1.unl.edu/api_frontend
MIT License
3 stars 1 forks source link

Fix searching and faceting on multiline values #82

Closed techgique closed 6 years ago

techgique commented 6 years ago

Remove the %0D in the API requests for facets and filters. Fixes #81 However, I'm concerned that this may potentially break other queries if we want to allow searches trying to match values with CRs, perhaps via <textarea> inputs for API search. We aren't using any currently that I'm aware of. None of the current facet URLs in Cather Letters have %0D in them, so this approach seems safe at the moment. @jduss4 is more familiar with this repository and may have other ideas how to implement this, perhaps making its behavior with Orchid less complicated. I realize now that I didn't add comments, so we'll want to add some information in the code too.

jduss4 commented 6 years ago

I'm tempted to say that we ought to change the data repository to automatically normalize all the spaces in fields going to text fields, and probably the keywords, too. I agree that this could potentially break other queries for searches... Hmm. What do you think?

techgique commented 6 years ago

See comment on the related issue: https://github.com/CDRH/api/issues/81#issuecomment-401429221

The normalize space method is different than removing vertical space though, right? We're can't globally remove vertical space.

jduss4 commented 6 years ago

@techgique the original normalize-space from XSLT does remove vertical space, so it depends, I guess, if we want to imitate that entirely.

http://www.xsltfunctions.com/xsl/fn_normalize-space.html

jduss4 commented 6 years ago

Looking at these changes, I think I feel kinda confused. Like, if the elasticsearch index has a field with a return line in it, then how would the searches here which remove the return line actually match against it? Would you be up for writing a couple tests? I don't think we have any integration tests, unfortunately, but I would feel a little better about having the expected behavior lined out in them with these changes.

techgique commented 6 years ago

They match after this change because the indexed values in Elasticsearch only have LF characters for new lines. But when Orchid renders the values in hidden <input>s to pass along with subsequent changes to one's searching, it adds an additional CR character that causes the API request not to match what's in Elasticsearch.

I can work on adding comments and tests for this if you want to handle the data repo changes for keyword fields?