Tangerine-Community / Tangerine

Digitize your offline data collection. Create your Forms online with Tangerine Editor, conduct them offline with the Tangerine Android App. All results you collect can be exported as a CSV file, easy for processing in a spreadsheet. Tangerine has been used in over 1 million assessments and surveys in over 60 countries and in 100 languages.
http://www.tangerinecentral.org/
GNU General Public License v3.0
48 stars 30 forks source link

Data Manager downloads archived CSVs from past versions of Tangerine #1855

Open rjcorwin opened 4 years ago

TSSlade commented 4 years ago

As the data manager, I expect analyses to be reproducible. The code I write to process our data and generate our analytic results needs to work the first time I click 'go', and the second time, and the third, and each time thereafter.

Constantly-changing CSV output precludes reproducible research. Changes such as the following are breaking changes, and should be avoided unless I am permitted an option to configure them.

  1. Changing delimiters between variable stem and answer suffix e.g., vocab_detail.1 --> vocab_detail_1
  2. Changing the structure of the variable output e.g., boolean variables with the names vocab.learnersSupportedToMakeSentences.0, vocab.learnersSupportedToMakeSentences.1, vocab.learnersSupportedToMakeSentences.2, and vocab.learnersSupportedToMakeSentences.888 --> categorical variable with the name vocab.learnersSupportedToMakeSentences
  3. Changing the type of boolean flags e.g., variables whose status is flagged by 'on' --> 1 or true

When I attempt to redownload the same data three months after I have originally done so and changes such as those above have been made without any indication to the user, analytic code fails without explanation.

rjcorwin commented 4 years ago

@TSSlade Thanks for bringing these up. When we write a change Log entry when releasing Tangerine it could use a section related to data output changes/features.

The first item was a "bug fix" in one of the releases but the change log doesn't make it clear when that happened. That delimiter seems like something projects might want configurable depending on how they name their variables aye? Underscores as a delimiter when your project uses underscores in variable names makes it hard to tell the difference.

On the second item, let's chat more about that one.

It looks like your third item changed in v3.3.0 per the Tangerine "Change Log" as seen here.

TSSlade commented 4 years ago

@rjsteinert - thanks for pointing me to the change log - I hadn't been aware of where it was living. Very helpful! I think, yes, in an ideal world it should be up to the data manager to configure the stem|suffix delimiter at the point of export, for precisely the reason you've highlighted. ...Although if the categorical approach - stem only, suffixes are the values captured within a given variable - is the wave of the future, such delimiters become largely redundant, right? (In a REALLY ideal world, I as a data manager could configure that part - stem+suffix vs. stem with suffixes as values - as part of my export process. But I can imagine that might be a bear to maintain.)