akvo / akvo-flow

A data collection and monitoring tool that works anywhere.
http://akvo.org/products/akvoflow/
GNU Affero General Public License v3.0
65 stars 31 forks source link

Create new question type for Caddisfly results #1577

Closed mtwestra closed 7 years ago

mtwestra commented 8 years ago

At the moment, Caddisfly strip test results are send through the FREE_TEXT question type. This is a hacky way of doing it, and necessitates checks on the value level how content should be interpreted (ie, if it is caddisfly JSON).

Instead, the proposal is to give Caddisfly it's own question type. What is needed to get this done is:

1) FLOW Dashboard -> FLOW App

2) FLOW App - Caddisfly app On the FLOW app, the user fills in a survey. On the caddisfly question, a button ‘perform test’ can be pressed. After this, the FLOW app sends an intent to the Caddisfly app, containing the test type mentioned above. The Caddisfly app selects the correct test on the basis of the type passed in.

3) Caddisfly app -> FLOW app

4) FLOW app -> FLOW dashboard The FLOW app treats the Caddisfly answer as any other response. The type will be 'CADDISFLY'. The FLOW app needs to pick up the image from an agreed folder, and put it in its own image folder structure

5) FLOW Dashboard On the FLOW dashboard, caddisfly result data will be treated as follows:

mtwestra commented 8 years ago

The columns in a data export could look like this (columns vertically):

Strip type: Chlorine and Free Chlorine Image: http://s3/akvoflow-33/88d7g8d7.png patch_Chlorine (ppm): 12 patch_Free_chlorine (ppm): 3.5

mtwestra commented 8 years ago

We can use a common schema for Caddisfly results.

Schema: { "type":"caddisfly", "uuid":TEST_UUID, "image":IMAGE_URL, "name": DISPLAY_NAME, "result":[RESULT_OBJECT,RESULT_OBJECT, ...] }

where:

*DISPLAY_NAME is the name that is displayed in reports and in the dashboard, such as "Chlorine and Free chlorine (HACH 27450)", "E. Coli", "Electrical Conductivity", etc. (required)

RESULT_OBJECT is a JSON object with the schema: { "name":NAME "id": ID "unit":UNIT "value":VALUE }

where:

There can be multiple result objects inside a single result

mtwestra commented 8 years ago

As @nhternup remarked, we need to support multiple languages. The question is where: 1) in the FLOW dashboard, where the colorimetric and strip tests should be presented in the right language, depending on the language setting of the dashboard 2) in the Caddisfly app - depending on the language setting of the app 3) in the inspect data tab / on the dashboard - depending on the language setting of the dashboard 4) in exported reports - depending on the language setting of the dashboard

When data is shown in the dashboard, we don't want to do any lookup in the backend, which means the data needs to be complete with a display name.

At the moment we don't have a good policy yet to handle export data in different languages, but that could change.

My preference would be to have the result data always contain English, and handling translations only on the app and the dashboard, through the IDs.

So my proposal would be that the ’name’ keys that are returned in the json response are always in English. To make sure we know which patch we are talking about, we will also include an ‘id’ field for each patch, on which they can be linked in the backend, because when we export, we need to know the description in each patch. This patch code could just be the strip code with an increasing integer attached:

RESULT_OBJECT is a JSON object with the schema: { "name":NAME "unit":UNIT "value":VALUE "id": PATCH_ID }

With the PATCH_ID = TEST_ID + "-" + i, with i starting with 1 and increasing.

So the whole flow would be: 1) user selects a test in the FLOW dashboard. 2) This selection is communicated to the FLOW app by the testID. 3) The FLOW app passes the testID to the Caddisfly app. 4) The Caddisfly app renders the test using the language setting of the app. 5) The result JSON contains the names in English only. 6) based on user preferences, the results can be exported using different languages.

mtwestra commented 8 years ago

One question is how to deal with the Caddisfly test information in the backend. For each test, we need to store something like:

And for each result:

This information needs to be present in the FLOW backend for the following reasons: 1) to let the user specify the test that she wants to use in a Caddisfly question type 2) to export data for a particular test: the column names and the number of columns are derived from this information.

Options that I can think of are: 1) Some kind of JSON or XML Java property file, which is read by the backend and used to populate a list of CaddisflyTest objects 2) Do everything within the Java class definition: create the necessary tests within the constructor of an CaddisflyTestList

iperdomo commented 8 years ago

It seems to me that we want to fix too many things in one feature. I would encourage to pick the essential ones and iterate. My first comments:

Instead, the proposal is to give Caddisfly it's own question type.

What's the migration path for instances using "Free text" for Caddisfly results ?

In order to be able to process the incoming data later, which will be in the shape of a json map with property/value pairs, we need to know for each test which properties we expect to get back - we need the schema of the expected answer.

In order to store the type definitions, we need a way to store metadata on the different tests. This could for example be a Java class holding the definitions, or some time of JSON-based property file.

If we need translation support, we need a physical file that can be uploaded to transifex and then downloaded (similar to the UI translations).

Defining a scheme is not a trivial task. Until now all our schema related validations are done implicitly in code (and not explicit in a file). Notice that if you want to describe the schema using JSON Schema, we can't use the current validation library for Java in GAE (because of the usage of Threads)

This could perhaps be done as an array of strings and types, as part of the definitions file mentioned above.

This proposal feels like inventing our own schema definition.

On the FLOW app, the user fills in a survey. On the caddisfly question, a button ‘perform test’ can be pressed. After this, the FLOW app sends an intent to the Caddisfly app, containing the test type mentioned above. The Caddisfly app selects the correct test on the basis of the type passed in.

I think that instead of inventing some coding for the type of test, we could use just a JSON string (safer to parse)? How many "dashes" we'll we have now (and in the future) ?

{"category": "STRIP", "producer": "HACH", "type": "CHLORINE 27450"} instead of STRIP-HACH-CHLORINE-27450

1) in the FLOW dashboard, where the colorimetric and strip tests should be presented in the right language, depending on the language setting of the dashboard 2) in the Caddisfly app - depending on the language setting of the app 3) in the inspect data tab / on the dashboard - depending on the language setting of the dashboard 4) in exported reports - depending on the language setting of the dashboard

AFAIK the Dashboard only support English, French, Spanish, we can't show the results in a different language than those, right?

When data is shown in the dashboard, we don't want to do any lookup in the backend, which means the data needs to be complete with a display name.

Why is that? In some cases we want to follow a relational model, in others we want a denormalized one?

mtwestra commented 8 years ago

@iperdomo comments below!

1) migration path - I don't think there is a need for it. The current situation is that either the caddisfly app returns a simple number (in the case of the fluoride measurement, or EC). As those are done as a free text question, they can be exported and analysed without problem. In the case of the strip test, there is at the moment only a single instance collecting data with it, and it returns JSON-formatted data. This data will be analysed by Josje and after that could be either brought in line with a new approach, or left as is in free text.

2) on translations: so your proposal is to use something like the locale files in GAE? I guess that makes a lot of sense, as the files could be shared between FLOW and Caddisfly, and handled by transifex for translations.

3) Yes, we are trying to invent a schema. You mention that we can't use the JSON Schema, so what is the best way to go about that? On the other hand, the strings will be created in the Android app, and there seems to be a JSON Schema validator (https://github.com/fge/json-schema-validator) that we could use in the app. Perhaps in this case it is enough to enforce that the creation of the JSON string happens in accordance with the JSON Schema?

4) On using the JSON string to identify tests: Isn't this risky? The idea is that the FLOW app, in the survey definition, passes the id of the test that needs to be done. So I guess there we can't use a JSON string, as it might/will brake the XML. In addition, using a json string with unordered keys might not be unique: to find a test we would need to compare on three things instead of a single thing.

5) The dashboard only supports English, French, and Spanish at the moment. I also think that this will be enough for the strip tests. However, on the app the tests potentially need to be shown in additional languages, as in the app we have more choice in languages. As this might be complex to achieve however, we could also say that the names of the tests will only be available in the dashboard languages, and that other translations should be done in the text of the individual question.

6) lookup / denormalised: I thought that we didn't have examples where we have lookup on the data when we export it, but I might be wrong? It seems to me that raw data needs to be self-sufficient, especially if it goes to external systems such as report server / cartodb.

iperdomo commented 8 years ago

4) On using the JSON string to identify tests: Isn't this risky? The idea is that the FLOW app, in the survey definition, passes the id of the test that needs to be done. So I guess there we can't use a JSON string, as it might/will brake the XML. In addition, using a json string with unordered keys might not be unique: to find a test we would need to compare on three things instead of a single thing.

I don't think is less risky than a single dashed string.

The idea is that the FLOW app, in the survey definition, passes the id of the test that needs to be done

Perhaps I misunderstood the initial proposal. If the dashed string is a unique identifier for the test, why we do we need a human readable string? If both apps share the test definitions (e.g. because of test name translations) a UUID or a number should serve the same purpose, right?

iperdomo commented 8 years ago

3) Yes, we are trying to invent a schema. You mention that we can't use the JSON Schema, so what is the best way to go about that? On the other hand, the strings will be created in the Android app, and there seems to be a JSON Schema validator (https://github.com/fge/json-schema-validator) that we could use in the app. Perhaps in this case it is enough to enforce that the creation of the JSON string happens in accordance with the JSON Schema?

I think that project (the one we're using for unilog schema) is too heavy for an android app.

ichinaski commented 8 years ago

Apologies for the intermission, but for JSON data serialization, the Jackson-based approach we are currently using (with Java domain classes in both GAE and the app) seems to work fine. Why wouldn't it suffice for this extension, if this is something we already have in place?

mtwestra commented 8 years ago

@iperdomo,

On the unique identifier: yes, we could also use a uuid or number. That might be fine, as in the definition there is enough human readable information to identify the test

On the JSON schema: @ichinaski, which particular part of the code do you mean?

mtwestra commented 8 years ago

@iperdomo, there is also still the issue of how to actually store the test information - in a properties file, java class, ...

ichinaski commented 8 years ago

On the JSON schema: @ichinaski, which particular part of the code do you mean?

App: Domain class: https://github.com/akvo/akvo-flow-mobile/blob/master/app/src/main/java/org/akvo/flow/domain/response/FormInstance.java Serialization: https://github.com/akvo/akvo-flow-mobile/blob/master/app/src/main/java/org/akvo/flow/service/DataSyncService.java#L233-L234

GAE: Domain class: https://github.com/akvo/akvo-flow/blob/develop/GAE/src/org/waterforpeople/mapping/domain/response/FormInstance.java Deserialization: https://github.com/akvo/akvo-flow/blob/develop/GAE/src/org/waterforpeople/mapping/serialization/SurveyInstanceHandler.java#L50-L57

mtwestra commented 8 years ago

@ichinaski thanks! that looks useful. @iperdomo, what do you think? We could have a serializer in the Caddisfly app, the FLOW app would just pass on the JSON string produced by the Caddisfly app, and in the backed, we can deserialize it if needed (for example, when splitting the results to different columns)

iperdomo commented 8 years ago

@ichinaski @mtwestra that's an enforced schema, yes. The only tricky bit is to keep those files in-sync. If we're already doing it and it's not a painful process, i'm in favor of this approach.

mtwestra commented 8 years ago

@ichinaski @iperdomo as the schema is quite simple and not expected to evolve much, I think we can do it this way. I'll make a more detailed technical proposal.

Only thing that remains is how to store the actual test metadata - in a class file, properties file, etc.

mtwestra commented 8 years ago

decision, after talking to @iperdomo and @muloem: We'll use a shared (or at least identical) JSON configuration file on the FLOW backend and the Caddisfly app with the required per-test meta data

mtwestra commented 8 years ago

First to implement is a shared JSON configuration file, containing the definitions of the caddisfly tests. Structure:

{tests:[ { "subtype":SUBTYPE "tags": TAGS "uuid":UUIID, "name": DISPLAY_NAME, "brand": BRAND "description":DESCRIPTION, "numResults":NUM_RESULTS, "results":[RESULT_OBJECT,RESULT_OBJECT, ...] }, {...}]}

with:

RESULT_OBJECT is a JSON object with the schema: { "name":DISPLAY_NAME "id":RESULT_ID "unit":UNIT }

where:

janagombitova commented 8 years ago

Test plan

Test plan taken from here: https://github.com/akvo/akvo-flow/pull/1723 Cross check that issues from 1st test round are resolved. Comments from 1st test round are in the link above

Test

  1. Create a new survey and a new question
  2. Choose the question type 'caddisfly'
  3. Add more questions to your survey (cover all question types)
  4. Add a repeated question group to the survey and add some Caddsifly tests in there as well
  5. Publish the survey and assign it to a Flow phone
  6. Download and install the caddisfly app
  7. Open the survey on the Flow phone and start a new data point
  8. The caddisfly question should be displayed correctly and when pressed, opens the caddisfly app.
  9. Run the test
  10. Once test is completed Caddisfly app should bring you to Flow app
  11. Continue with the form and submit
  12. Create multiple data points
  13. Create a monitoring survey with registration form and 2 monitoring forms. Make sure all three forms have a few Caddisfly tests in there
  14. Follow the above mentioned steps

Confirmation of old implementation

  1. Enable external sources on the dashboard
  2. Create a survey with the new Caddisfly questions and the old implementation, publish the survey and assign to the phone
  3. Collect data
  4. In the dashboard in the Inspect data tab, Monitoring tab, both maps check that all data show properly

Test all steps in different browsers as well

muloem commented 7 years ago

@mtwestra At the moment we are loading the caddisfly resources during dashboard loading. This creates an extra load time for the dashboard even in cases where the caddisfly resources will not be used (i.e. dashboards without caddisfly integration). We need to optimise this to only load the caddisfly resources when they are actually going to be used.

muloem commented 7 years ago

@mtwestra One more thing noticed is that the cascade resources are loaded twice in this branch. This is probably a simple one liner to only load them once.

screen shot 2016-10-20 at 11 43 58
mtwestra commented 7 years ago

@muloem, they can be downloaded at the moment a user goes to the survey edit tab. I'll implement that. I'll also look at the cascade loading.

muloem commented 7 years ago

@mtwestra yep. that sounds reasonable. I would add as an extra condition - only if the instance has caddisfly integration enabled.

muloem commented 7 years ago

@mtwestra can you also experiment to see whether its possible to only load caddisfly resources when a question type caddisfly is selected? this way we could narrow down the loading even further

janagombitova commented 7 years ago

Notes from testing

@mtwestra can you please change the way we show "Caddisfly" question type in the dropdown to say "Akvo Caddisfly" so we have the official branding name? Thank you

mtwestra commented 7 years ago

@mulo, I don't think this will work. When the user creates a new question, and she selects the 'caddisfly' type, the tests need to appear immediately. So they need to be loaded beforehand, as they are now.

muloem commented 7 years ago

Ok. we'll skip that for now then. 👍

janagombitova commented 7 years ago

Notes from tests 2

@mtwestra was testing running the Caddisfly tests on the device and the issue that if a tests is located in a repeated question group, the test does not work is still present.

The case is: I have a few tests in a group that has the flag repeat this group on. Once I click in the app on Go to test > Caddisfly opens but I get a message: Cannot start test. Requested test in not available. Please contact technical support. > I have to click on OK and that brings me back to Flow app. All tests that I have in the repeated group work fine in other groups.

Steps Forward

Will be dealt with in another iteration https://github.com/akvo/akvo-flow-mobile/issues/475

mtwestra commented 7 years ago

@janagombitova , we need an android developer for that one. I suggest we release this with this caveat, and let Valerie solve it. I have never seen a caddisfly question in a repeated question group until now, as it does not make much sense - usually you have one datapoint for one water sample / water point that is tested.

janagombitova commented 7 years ago

Notes from test 3

Besides the above mentioned name change in the dropdown and issue with tests not running in repeated groups I could not run any actual strip tests as the initial quality check stops at 10 out of 15 and it appears as if the shadows are not checked. I told Nischal about the issue

Will continue with the other issues as much as possible for now.

mtwestra commented 7 years ago

@janagombitova, which caddisfly version did you use? we need to look at that. It might be an issue with the quality assurance process. It might help to do it in front of a window though, to get a good source of light.

janagombitova commented 7 years ago

@mark - I have spoken with Nischal and the issue was with the calibration card I was using as well as with the app version. After some work and clarification we got it sorted and I could run a few tests, but there is still work to do when it comes to the tests, in my opinion. But I know that the team is working on that this week.

Once the name change is reviewed and merged into the upcoming UAT build I will re-review the implementation again

janagombitova commented 7 years ago

Notes from last test round

Besides issues that are covered in new separate issues, the implementation passes the test plan 👍

muloem commented 7 years ago
muloem commented 7 years ago

The caddisfly part seems to be falling over if there is no data.

SEVERE: Error parsing Caddisfly resource: null
java.lang.NullPointerException
    at java.io.Reader.<init>(Reader.java:78)
    at java.io.InputStreamReader.<init>(InputStreamReader.java:113)
    at org.apache.commons.io.IOUtils.copy(IOUtils.java:1906)
    at org.apache.commons.io.IOUtils.toString(IOUtils.java:778)
    at org.apache.commons.io.IOUtils.toString(IOUtils.java:759)
    at com.gallatinsystems.survey.dao.CaddisflyResourceDao.listResources(CaddisflyResourceDao.java:53)
    at org.waterforpeople.mapping.dataexport.GraphicalSurveySummaryExporter.createRawDataHeader(GraphicalSurveySummaryExporter.java:1326)
    at org.waterforpeople.mapping.dataexport.GraphicalSurveySummaryExporter.fetchAndWriteRawData(GraphicalSurveySummaryExporter.java:458)
    at org.waterforpeople.mapping.dataexport.GraphicalSurveySummaryExporter.export(GraphicalSurveySummaryExporter.java:377)
    at akvo.flow_services.exporter$export_report.invoke(exporter.clj:58)
    at akvo.flow_services.scheduler.ExportJob.execute(scheduler.clj:52)
    at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
    at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)

2016-11-03 18:45:16,376 [QuartziteScheduler_Worker-1] ERROR org.waterforpeople.mapping.dataexport.GraphicalSurveySummaryExporter - Error generating report: null
java.lang.NullPointerException
    at org.waterforpeople.mapping.dataexport.GraphicalSurveySummaryExporter.caddisflyCellValues(GraphicalSurveySummaryExporter.java:870)
    at org.waterforpeople.mapping.dataexport.GraphicalSurveySummaryExporter.writeAnswer(GraphicalSurveySummaryExporter.java:745)
    at org.waterforpeople.mapping.dataexport.GraphicalSurveySummaryExporter.writeInstanceData(GraphicalSurveySummaryExporter.java:602)
    at org.waterforpeople.mapping.dataexport.GraphicalSurveySummaryExporter.fetchAndWriteRawData(GraphicalSurveySummaryExporter.java:523)
    at org.waterforpeople.mapping.dataexport.GraphicalSurveySummaryExporter.export(GraphicalSurveySummaryExporter.java:377)
    at akvo.flow_services.exporter$export_report.invoke(exporter.clj:58)
    at akvo.flow_services.scheduler.ExportJob.execute(scheduler.clj:52)
    at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
    at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)