Indicia-Team / warehouse

GNU General Public License v3.0
5 stars 3 forks source link

Multi-Value custom attributes in reports need escaping #197

Open dve opened 8 years ago

dve commented 8 years ago

Given a custom survey text attribute with id 1 which allows multiple usage. If a survey as an attribute value of Hello, World! and Hello the report api returns this xml element:

<attr_survey_2>Hello, Hello, World!</attr_survey_2>

It's not clear if the attribute values should be Hello, Hello and World! or Hello, World! and Hello

This needs to be escaped some how, or an extra xml element must be used per attribute value.

johnvanbreda commented 8 years ago

@dve I've committed a proposed fix for this which converts the output to properly formatted CSV.

@JimBacon can I ask you nicely to take a look at the attached pull request please? The solution converts the output of the example above to <attr_survey_2>"Hello","Hello, World!"</attr_survey_2>. It obviously would affect report outputs that include dynamically generated multi-valued attributes so I'd appreciate a second opinion on whether this is a good idea before pulling into develop.

JimBacon commented 8 years ago

First, I note from code inspection that values are only quoted if they contain a comma so the example would output <attr_survey_2>Hello, "Hello, World!"</attr_survey_2> (i.e. no quotes on first Hello). Further, where an attribute is being quoted, any quotes present in the attribute are doubled. If there were also values of "Hello" and John said "Hello, World" we would get <attr_survey_2>Hello,"Hello, World!", "Hello", "John said ""Hello, World"""</attr_survey_2>.

That is perfectly logical for a machine. Shouldn't there be a corresponding bit of code in the client_helpers so that, when viewing a report, this is decoded? That would require something to indicate that a multivalue rather than a single value attribute is being handled.

Also wondering if attr like '%,%' might be less efficient than position(',' in attr) > 0 as a way of determining if an attribute contains a comma.

dve commented 8 years ago

@johnvanbreda when can we expect this to be merged into the main branch? The client_helpers part is not important for us.

johnvanbreda commented 8 years ago

@JimBacon @dve Given that there might be existing applications relying on the current method of output I'm slightly nervous of this change. Updating the client helpers to decode needs a bit of work as well because it doesn't recognise which column is multi-value, and downloads would not be processed anyway. Given this I am wondering if the best approach is to abandon the CSV format output and instead, only for XML format outputs convert this to a set of XML elements. So you could have something like:

<attr_survey_2>
  <value>Hello</value>
  <value>Hello, World!</value>
  <value>"Hello"</value>
  <value>John said "Hello, World"</value>
</attr_survey_2>

This would only apply for multi-value attributes when the output format is XML.

dve commented 8 years ago

I would prefer:

<attr_survey_2>
  <value value_id=23>Hello</value>
  <value value_id=24>Hello, World!</value>
  <value value_id=25>"Hello"</value>
  <value value_id=26>John said "Hello, World"</value>
</attr_survey_2>

And it should also be like this, if the attribute allows multiple values, but only one is used:

<attr_survey_2>
  <value value_id=23>Hello</value>
</attr_survey_2>