Create new question type for Caddisfly results

mtwestra commented 8 years ago

At the moment, Caddisfly strip test results are send through the FREE_TEXT question type. This is a hacky way of doing it, and necessitates checks on the value level how content should be interpreted (ie, if it is caddisfly JSON).

Instead, the proposal is to give Caddisfly it's own question type. What is needed to get this done is:

1) FLOW Dashboard -> FLOW App

Introduce a Caddisfly question type
FLOW users need a way to select which test type will be done by Caddisfly. The idea is to have a two-step presentation of the information, either through a double drop down, or by a more elaborate user interface. The top level ordering is the type of test: Air quality, Water quality, Soil, Food. The next level is the actual tests, with information on the full name, producer, the range, etc. This information should be available in multiple languages.
To communicate the test type to Caddisfly, we need a naming convention for the tests. For example something like “STRIP-HACH-CHLORINE-27450” and “SENS-SODAQ-EC”, so a combination of the type of test (STRIP for strip, SENS for sensor), the producer of the sensor (HACH, SODAQ), etc.
The name of the test is embedded in the survey definition XML file as a property, for example “type=“STRIP-HACH-CHLORINE-27450””.
In order to store the type definitions, we need a way to store metadata on the different tests. This could for example be a Java class holding the definitions, or some time of JSON-based property file.
In order to be able to process the incoming data later, which will be in the shape of a json map with property/value pairs, we need to know for each test which properties we expect to get back - we need the schema of the expected answer. This could perhaps be done as an array of strings and types, as part of the definitions file mentioned above.

2) FLOW App - Caddisfly app On the FLOW app, the user fills in a survey. On the caddisfly question, a button ‘perform test’ can be pressed. After this, the FLOW app sends an intent to the Caddisfly app, containing the test type mentioned above. The Caddisfly app selects the correct test on the basis of the type passed in.

3) Caddisfly app -> FLOW app

On the Caddisfly app, the test is performed. This leads to a set of data that contain multiple properties. The data is encoded in a JSON response to the FLOW app.
After discussions with @muloem, questions will be dealt with as separate images. The response would then only contain a uuid for the image name, and the Caddisfly app would need to put the images in the FLOW external directory, where they can be picked up. For example: {type:”STRIP-HACH-CHLORINE-27450”, ... , image:E003F608-9F37-11E5-98D1-197109FBD261}.
Only a single image will be sent back as part of a caddisfly response. If there are multiple patches in a strip test, the images will be combined into a single image.

4) FLOW app -> FLOW dashboard The FLOW app treats the Caddisfly answer as any other response. The type will be 'CADDISFLY'. The FLOW app needs to pick up the image from an agreed folder, and put it in its own image folder structure

5) FLOW Dashboard On the FLOW dashboard, caddisfly result data will be treated as follows:

On the inspect data tab and monitoring tab, the result data is shown in a formatted way: Title of the test done, names, values, and unit per patch
When data is exported, the data is spread over different columns
When data is imported, it is ignored, so data cleaning in Excel is disabled.

mtwestra commented 8 years ago

The columns in a data export could look like this (columns vertically):

Strip type: Chlorine and Free Chlorine Image: http://s3/akvoflow-33/88d7g8d7.png patch_Chlorine (ppm): 12 patch_Free_chlorine (ppm): 3.5

mtwestra commented 8 years ago

We can use a common schema for Caddisfly results.

Schema: { "type":"caddisfly", "uuid":TEST_UUID, "image":IMAGE_URL, "name": DISPLAY_NAME, "result":[RESULT_OBJECT,RESULT_OBJECT, ...] }

where:

TEST_UUID is the uuid for the test (required).
IMAGE_URL is the result image URL (optional)

*DISPLAY_NAME is the name that is displayed in reports and in the dashboard, such as "Chlorine and Free chlorine (HACH 27450)", "E. Coli", "Electrical Conductivity", etc. (required)

RESULT_OBJECT is a JSON object with the schema: { "name":NAME "id": ID "unit":UNIT "value":VALUE }

where:

NAME is the display name of the variable. Examples: "Chloride", "Free Chloride", "Count", "Ecoli present", etc
ID is the unique id of the particular measurement
UNIT is the unit of measurement. Examples: "ppm", "number" (in the case of a count for example), "deg Celcius", etc
VALUE is the value of the measurement. Examples are "3.5", "10", "yes"

There can be multiple result objects inside a single result

mtwestra commented 8 years ago

As @nhternup remarked, we need to support multiple languages. The question is where: 1) in the FLOW dashboard, where the colorimetric and strip tests should be presented in the right language, depending on the language setting of the dashboard 2) in the Caddisfly app - depending on the language setting of the app 3) in the inspect data tab / on the dashboard - depending on the language setting of the dashboard 4) in exported reports - depending on the language setting of the dashboard

When data is shown in the dashboard, we don't want to do any lookup in the backend, which means the data needs to be complete with a display name.

At the moment we don't have a good policy yet to handle export data in different languages, but that could change.

My preference would be to have the result data always contain English, and handling translations only on the app and the dashboard, through the IDs.

So my proposal would be that the ’name’ keys that are returned in the json response are always in English. To make sure we know which patch we are talking about, we will also include an ‘id’ field for each patch, on which they can be linked in the backend, because when we export, we need to know the description in each patch. This patch code could just be the strip code with an increasing integer attached:

RESULT_OBJECT is a JSON object with the schema: { "name":NAME "unit":UNIT "value":VALUE "id": PATCH_ID }

With the PATCH_ID = TEST_ID + "-" + i, with i starting with 1 and increasing.

So the whole flow would be: 1) user selects a test in the FLOW dashboard. 2) This selection is communicated to the FLOW app by the testID. 3) The FLOW app passes the testID to the Caddisfly app. 4) The Caddisfly app renders the test using the language setting of the app. 5) The result JSON contains the names in English only. 6) based on user preferences, the results can be exported using different languages.

mtwestra commented 8 years ago

One question is how to deal with the Caddisfly test information in the backend. For each test, we need to store something like:

type: The type of test, for example 'caddisfly-strip', 'caddisfly-sensor', etc
tags: A list of tags that apply to this test, such as "water quality", "agriculture", "medical"
id: A unique id of the test, for example "caddisfly-sensor-ec-SODAQ-v1"
name: The display name of the test
translations: Translations of the display name
number of results: How many results are returned by the test.

And for each result:

name: Display name of the result
translations: translations of the display name
id: A unique id for the individual test, derived from the testId, by adding a number
unit: the unit of measurement

This information needs to be present in the FLOW backend for the following reasons: 1) to let the user specify the test that she wants to use in a Caddisfly question type 2) to export data for a particular test: the column names and the number of columns are derived from this information.

Options that I can think of are: 1) Some kind of JSON or XML Java property file, which is read by the backend and used to populate a list of CaddisflyTest objects 2) Do everything within the Java class definition: create the necessary tests within the constructor of an CaddisflyTestList

iperdomo commented 8 years ago

It seems to me that we want to fix too many things in one feature. I would encourage to pick the essential ones and iterate. My first comments:

Instead, the proposal is to give Caddisfly it's own question type.

What's the migration path for instances using "Free text" for Caddisfly results ?

In order to be able to process the incoming data later, which will be in the shape of a json map with property/value pairs, we need to know for each test which properties we expect to get back - we need the schema of the expected answer.

In order to store the type definitions, we need a way to store metadata on the different tests. This could for example be a Java class holding the definitions, or some time of JSON-based property file.

If we need translation support, we need a physical file that can be uploaded to transifex and then downloaded (similar to the UI translations).

Defining a scheme is not a trivial task. Until now all our schema related validations are done implicitly in code (and not explicit in a file). Notice that if you want to describe the schema using JSON Schema, we can't use the current validation library for Java in GAE (because of the usage of Threads)

This could perhaps be done as an array of strings and types, as part of the definitions file mentioned above.

This proposal feels like inventing our own schema definition.

On the FLOW app, the user fills in a survey. On the caddisfly question, a button ‘perform test’ can be pressed. After this, the FLOW app sends an intent to the Caddisfly app, containing the test type mentioned above. The Caddisfly app selects the correct test on the basis of the type passed in.

I think that instead of inventing some coding for the type of test, we could use just a JSON string (safer to parse)? How many "dashes" we'll we have now (and in the future) ?

{"category": "STRIP", "producer": "HACH", "type": "CHLORINE 27450"} instead of STRIP-HACH-CHLORINE-27450

1) in the FLOW dashboard, where the colorimetric and strip tests should be presented in the right language, depending on the language setting of the dashboard 2) in the Caddisfly app - depending on the language setting of the app 3) in the inspect data tab / on the dashboard - depending on the language setting of the dashboard 4) in exported reports - depending on the language setting of the dashboard

AFAIK the Dashboard only support English, French, Spanish, we can't show the results in a different language than those, right?

When data is shown in the dashboard, we don't want to do any lookup in the backend, which means the data needs to be complete with a display name.

Why is that? In some cases we want to follow a relational model, in others we want a denormalized one?

mtwestra commented 8 years ago

@iperdomo comments below!

1) migration path - I don't think there is a need for it. The current situation is that either the caddisfly app returns a simple number (in the case of the fluoride measurement, or EC). As those are done as a free text question, they can be exported and analysed without problem. In the case of the strip test, there is at the moment only a single instance collecting data with it, and it returns JSON-formatted data. This data will be analysed by Josje and after that could be either brought in line with a new approach, or left as is in free text.

2) on translations: so your proposal is to use something like the locale files in GAE? I guess that makes a lot of sense, as the files could be shared between FLOW and Caddisfly, and handled by transifex for translations.

3) Yes, we are trying to invent a schema. You mention that we can't use the JSON Schema, so what is the best way to go about that? On the other hand, the strings will be created in the Android app, and there seems to be a JSON Schema validator (https://github.com/fge/json-schema-validator) that we could use in the app. Perhaps in this case it is enough to enforce that the creation of the JSON string happens in accordance with the JSON Schema?

4) On using the JSON string to identify tests: Isn't this risky? The idea is that the FLOW app, in the survey definition, passes the id of the test that needs to be done. So I guess there we can't use a JSON string, as it might/will brake the XML. In addition, using a json string with unordered keys might not be unique: to find a test we would need to compare on three things instead of a single thing.

5) The dashboard only supports English, French, and Spanish at the moment. I also think that this will be enough for the strip tests. However, on the app the tests potentially need to be shown in additional languages, as in the app we have more choice in languages. As this might be complex to achieve however, we could also say that the names of the tests will only be available in the dashboard languages, and that other translations should be done in the text of the individual question.

6) lookup / denormalised: I thought that we didn't have examples where we have lookup on the data when we export it, but I might be wrong? It seems to me that raw data needs to be self-sufficient, especially if it goes to external systems such as report server / cartodb.

iperdomo commented 8 years ago

4) On using the JSON string to identify tests: Isn't this risky? The idea is that the FLOW app, in the survey definition, passes the id of the test that needs to be done. So I guess there we can't use a JSON string, as it might/will brake the XML. In addition, using a json string with unordered keys might not be unique: to find a test we would need to compare on three things instead of a single thing.

I don't think is less risky than a single dashed string.

The idea is that the FLOW app, in the survey definition, passes the id of the test that needs to be done

Perhaps I misunderstood the initial proposal. If the dashed string is a unique identifier for the test, why we do we need a human readable string? If both apps share the test definitions (e.g. because of test name translations) a UUID or a number should serve the same purpose, right?

iperdomo commented 8 years ago

3) Yes, we are trying to invent a schema. You mention that we can't use the JSON Schema, so what is the best way to go about that? On the other hand, the strings will be created in the Android app, and there seems to be a JSON Schema validator (https://github.com/fge/json-schema-validator) that we could use in the app. Perhaps in this case it is enough to enforce that the creation of the JSON string happens in accordance with the JSON Schema?

I think that project (the one we're using for unilog schema) is too heavy for an android app.

ichinaski commented 8 years ago

Apologies for the intermission, but for JSON data serialization, the Jackson-based approach we are currently using (with Java domain classes in both GAE and the app) seems to work fine. Why wouldn't it suffice for this extension, if this is something we already have in place?

mtwestra commented 8 years ago

@iperdomo,

On the unique identifier: yes, we could also use a uuid or number. That might be fine, as in the definition there is enough human readable information to identify the test

On the JSON schema: @ichinaski, which particular part of the code do you mean?

mtwestra commented 8 years ago

@iperdomo, there is also still the issue of how to actually store the test information - in a properties file, java class, ...

ichinaski commented 8 years ago

On the JSON schema: @ichinaski, which particular part of the code do you mean?

App: Domain class: https://github.com/akvo/akvo-flow-mobile/blob/master/app/src/main/java/org/akvo/flow/domain/response/FormInstance.java Serialization: https://github.com/akvo/akvo-flow-mobile/blob/master/app/src/main/java/org/akvo/flow/service/DataSyncService.java#L233-L234

GAE: Domain class: https://github.com/akvo/akvo-flow/blob/develop/GAE/src/org/waterforpeople/mapping/domain/response/FormInstance.java Deserialization: https://github.com/akvo/akvo-flow/blob/develop/GAE/src/org/waterforpeople/mapping/serialization/SurveyInstanceHandler.java#L50-L57

mtwestra commented 8 years ago

@ichinaski thanks! that looks useful. @iperdomo, what do you think? We could have a serializer in the Caddisfly app, the FLOW app would just pass on the JSON string produced by the Caddisfly app, and in the backed, we can deserialize it if needed (for example, when splitting the results to different columns)

iperdomo commented 8 years ago

@ichinaski @mtwestra that's an enforced schema, yes. The only tricky bit is to keep those files in-sync. If we're already doing it and it's not a painful process, i'm in favor of this approach.

mtwestra commented 8 years ago

@ichinaski @iperdomo as the schema is quite simple and not expected to evolve much, I think we can do it this way. I'll make a more detailed technical proposal.

Only thing that remains is how to store the actual test metadata - in a class file, properties file, etc.

mtwestra commented 8 years ago

decision, after talking to @iperdomo and @muloem: We'll use a shared (or at least identical) JSON configuration file on the FLOW backend and the Caddisfly app with the required per-test meta data

mtwestra commented 8 years ago

First to implement is a shared JSON configuration file, containing the definitions of the caddisfly tests. Structure:

{tests:[ { "subtype":SUBTYPE "tags": TAGS "uuid":UUIID, "name": DISPLAY_NAME, "brand": BRAND "description":DESCRIPTION, "numResults":NUM_RESULTS, "results":[RESULT_OBJECT,RESULT_OBJECT, ...] }, {...}]}

with:

SUBTYPE is the type of caddisfly test: colour, sensor, striptest
TAGS is an array of tags given to this test, for example 'water, agriculture, sensor', etc. In the dashboard, we could allow filtering on the available tags
UUID is a uuid for the test (required).
DISPLAY_NAME is the name that is displayed in reports and in the dashboard, such as "Chlorine and Free chlorine", "E. Coli", "Electrical Conductivity", etc. (required)
BRAND is the brand info, such as " HACH 27450, 25 pcs"
DESCRIPTION is a description of the test
NUM_RESULTS is the number of results expected back
'results' is an array with result descriptions

RESULT_OBJECT is a JSON object with the schema: { "name":DISPLAY_NAME "id":RESULT_ID "unit":UNIT }

where:

DISPLAY_NAME is the display name of the variable. Examples: "Chloride", "Free Chloride", "Count", "Ecoli present", etc
RESULT_ID is the unique id of the particular measurement
UNIT is the unit of measurement. Examples: "ppm", "number" (in the case of a count for example), "deg Celcius", etc

janagombitova commented 8 years ago

Test plan

Test plan taken from here: https://github.com/akvo/akvo-flow/pull/1723 Cross check that issues from 1st test round are resolved. Comments from 1st test round are in the link above

Test

Create a new survey and a new question
Choose the question type 'caddisfly'
Add more questions to your survey (cover all question types)
Add a repeated question group to the survey and add some Caddsifly tests in there as well
Publish the survey and assign it to a Flow phone
Download and install the caddisfly app
Open the survey on the Flow phone and start a new data point
The caddisfly question should be displayed correctly and when pressed, opens the caddisfly app.
Run the test
Once test is completed Caddisfly app should bring you to Flow app
Continue with the form and submit
Create multiple data points
Create a monitoring survey with registration form and 2 monitoring forms. Make sure all three forms have a few Caddisfly tests in there
Follow the above mentioned steps

Confirmation of old implementation

Enable external sources on the dashboard
Create a survey with the new Caddisfly questions and the old implementation, publish the survey and assign to the phone
Collect data
In the dashboard in the Inspect data tab, Monitoring tab, both maps check that all data show properly

Test all steps in different browsers as well

muloem commented 7 years ago

@mtwestra At the moment we are loading the caddisfly resources during dashboard loading. This creates an extra load time for the dashboard even in cases where the caddisfly resources will not be used (i.e. dashboards without caddisfly integration). We need to optimise this to only load the caddisfly resources when they are actually going to be used.

muloem commented 7 years ago

@mtwestra One more thing noticed is that the cascade resources are loaded twice in this branch. This is probably a simple one liner to only load them once.

mtwestra commented 7 years ago

@muloem, they can be downloaded at the moment a user goes to the survey edit tab. I'll implement that. I'll also look at the cascade loading.

muloem commented 7 years ago

@mtwestra yep. that sounds reasonable. I would add as an extra condition - only if the instance has caddisfly integration enabled.

muloem commented 7 years ago

@mtwestra can you also experiment to see whether its possible to only load caddisfly resources when a question type caddisfly is selected? this way we could narrow down the loading even further

janagombitova commented 7 years ago

Notes from testing

@mtwestra can you please change the way we show "Caddisfly" question type in the dropdown to say "Akvo Caddisfly" so we have the official branding name? Thank you

mtwestra commented 7 years ago

@mulo, I don't think this will work. When the user creates a new question, and she selects the 'caddisfly' type, the tests need to appear immediately. So they need to be loaded beforehand, as they are now.

muloem commented 7 years ago

Ok. we'll skip that for now then. 👍

janagombitova commented 7 years ago

Notes from tests 2

@mtwestra was testing running the Caddisfly tests on the device and the issue that if a tests is located in a repeated question group, the test does not work is still present.

The case is: I have a few tests in a group that has the flag repeat this group on. Once I click in the app on Go to test > Caddisfly opens but I get a message: Cannot start test. Requested test in not available. Please contact technical support. > I have to click on OK and that brings me back to Flow app. All tests that I have in the repeated group work fine in other groups.

Steps Forward

Will be dealt with in another iteration https://github.com/akvo/akvo-flow-mobile/issues/475

mtwestra commented 7 years ago

@janagombitova , we need an android developer for that one. I suggest we release this with this caveat, and let Valerie solve it. I have never seen a caddisfly question in a repeated question group until now, as it does not make much sense - usually you have one datapoint for one water sample / water point that is tested.

janagombitova commented 7 years ago

Notes from test 3

Besides the above mentioned name change in the dropdown and issue with tests not running in repeated groups I could not run any actual strip tests as the initial quality check stops at 10 out of 15 and it appears as if the shadows are not checked. I told Nischal about the issue

Will continue with the other issues as much as possible for now.

mtwestra commented 7 years ago

@janagombitova, which caddisfly version did you use? we need to look at that. It might be an issue with the quality assurance process. It might help to do it in front of a window though, to get a good source of light.

janagombitova commented 7 years ago

@mark - I have spoken with Nischal and the issue was with the calibration card I was using as well as with the app version. After some work and clarification we got it sorted and I could run a few tests, but there is still work to do when it comes to the tests, in my opinion. But I know that the team is working on that this week.

Once the name change is reviewed and merged into the upcoming UAT build I will re-review the implementation again

janagombitova commented 7 years ago

Notes from last test round

Besides issues that are covered in new separate issues, the implementation passes the test plan 👍

muloem commented 7 years ago

The caddisfly part seems to be falling over if there is no data.

SEVERE: Error parsing Caddisfly resource: null
java.lang.NullPointerException
    at java.io.Reader.<init>(Reader.java:78)
    at java.io.InputStreamReader.<init>(InputStreamReader.java:113)
    at org.apache.commons.io.IOUtils.copy(IOUtils.java:1906)
    at org.apache.commons.io.IOUtils.toString(IOUtils.java:778)
    at org.apache.commons.io.IOUtils.toString(IOUtils.java:759)
    at com.gallatinsystems.survey.dao.CaddisflyResourceDao.listResources(CaddisflyResourceDao.java:53)
    at org.waterforpeople.mapping.dataexport.GraphicalSurveySummaryExporter.createRawDataHeader(GraphicalSurveySummaryExporter.java:1326)
    at org.waterforpeople.mapping.dataexport.GraphicalSurveySummaryExporter.fetchAndWriteRawData(GraphicalSurveySummaryExporter.java:458)
    at org.waterforpeople.mapping.dataexport.GraphicalSurveySummaryExporter.export(GraphicalSurveySummaryExporter.java:377)
    at akvo.flow_services.exporter$export_report.invoke(exporter.clj:58)
    at akvo.flow_services.scheduler.ExportJob.execute(scheduler.clj:52)
    at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
    at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)

2016-11-03 18:45:16,376 [QuartziteScheduler_Worker-1] ERROR org.waterforpeople.mapping.dataexport.GraphicalSurveySummaryExporter - Error generating report: null
java.lang.NullPointerException
    at org.waterforpeople.mapping.dataexport.GraphicalSurveySummaryExporter.caddisflyCellValues(GraphicalSurveySummaryExporter.java:870)
    at org.waterforpeople.mapping.dataexport.GraphicalSurveySummaryExporter.writeAnswer(GraphicalSurveySummaryExporter.java:745)
    at org.waterforpeople.mapping.dataexport.GraphicalSurveySummaryExporter.writeInstanceData(GraphicalSurveySummaryExporter.java:602)
    at org.waterforpeople.mapping.dataexport.GraphicalSurveySummaryExporter.fetchAndWriteRawData(GraphicalSurveySummaryExporter.java:523)
    at org.waterforpeople.mapping.dataexport.GraphicalSurveySummaryExporter.export(GraphicalSurveySummaryExporter.java:377)
    at akvo.flow_services.exporter$export_report.invoke(exporter.clj:58)
    at akvo.flow_services.scheduler.ExportJob.execute(scheduler.clj:52)
    at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
    at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)

akvo / akvo-flow