Tangerine-Community / Tangerine

Digitize your offline data collection. Create your Forms online with Tangerine Editor, conduct them offline with the Tangerine Android App. All results you collect can be exported as a CSV file, easy for processing in a spreadsheet. Tangerine has been used in over 1 million assessments and surveys in over 60 countries and in 100 languages.
http://www.tangerinecentral.org/
GNU General Public License v3.0
49 stars 30 forks source link

CSV outputs show separate/duplicate columns if a variable moves from one section to another due to a form change #3637

Open esurface opened 11 months ago

esurface commented 11 months ago

Issue

If a change is made to a form that moves a variable to a different section, then the CSV output will show that variable in two columns. This is not necessarily a bug, but Tangerine should handle this scenario.

CSV outputs are designed to show variables by section so they follow the data dictionary. Changes to the structure of the sections and variables in different versions of the form will change the order of the headers in the CSV outputs.

Example

Form version one has the variable held in the section Sought

{
  "_id": "abc",
  "formId": "form-1",
  "formVersion": "v1",
  "form-06b414cc-4971-46da-b121-fd3362e8d1f6.item_Sought.held": "0",
}

Form version two has the variable held in the section Crime

{
  "_id": "def",
  "formId": "form-1",
  "formVersion": "v1",
  "form-06b414cc-4971-46da-b121-fd3362e8d1f6.item_Crime.held": "0",
}

The CSV output for this form will be:

_id formId formVersion item_Sought_disabled held item_Crime_disabled held
abc form1 v1 FALSE 0 FALSE UNDEFINED
def form1 v2 FALSE UNDEFINED FALSE 0

Considerations

  1. Solutions to the issue will need to consider how to implicitly infer a form version from the csv-reporting metadata

    • The form versioning feature of Tangerine is usually implemented since there is no UI. A solution
    • The form version could be assumed using git history of the form file
  2. Solutions will also need to consider the impact on the ordering of sections and variables in the outputs

    • Simply combining the variable into one column breaks the current order of the variables into csvs
  3. MySQL outputs do not have this issue since duplicate variable are not allowed

Possible Solutions

  1. Add a UI option to output CSVs by version. One CSV file per Form Version
  2. Add a UI option to output CSVs as a distict set of variables (instead of in data dictionary order)
TSSlade commented 11 months ago

@esurface - are you certain this is in fact the current behavior? I don't think my experience has reflected this. (The same varname showing up in multiple columns.)

TSSlade commented 11 months ago

Also wanted to confirm - the illustrative JSON for the second block still says "formVersion": "v1", while the illustrative CSV output says formVersion is v2. Is that a typo, or are you suggesting that there be some manner of auto-incrementing happening? (I'm assuming the former - I don't think an 'automagical' auto-increment would be the ideal way to go.)

In re: "breaking the order of variables into CSVs" - this is already somewhat broken, in that late-added variables get appended to the end of the CSV column list rather than actually being inserted alongside their neighbors in the instrument proper.

For instance, if I've generated data for an instrument having SectionA.item1-SectionA.item10, SectionB.item1-SectionB.item10, and SectionC.item1-SectionC.item10 in that order, and then I add the variable SectionA.item11, that new variable will wind up as the 31st item in the column list rather than the 11th. (Ignoring all the metadata columns for the purposes of this example.)

If you want to fix that, that would be cool. But the current reality doesn't seem to match what you're describing under bullet 2.