mberg commented 7 years ago

As a user, I would like to be able to see how long an enumerator spent on each question in the login.

Tracking swiping, back forth, etc would also be valuable but the time spent per question would provide the most upfront value.

Ideally, this would be part of a meta data file which can be included with the submissions.

ChrisCorey commented 7 years ago

I'm surprised this hasn't been addressed. There are 4, at least, reasons to add this feature:

As an indicator of data quality. When an item, or section of items are answered too quickly it can indicate respondent indifference to the substance of the items and/or falsification by an interviewer.
In pretesting timing is needed for budgetary reasons. In large surveys decisions may need to be made about items to retain or exclude in order to stay w/in budget estimates for interviewer labor hours.
Our IRB has required timing as a proxy for respondent burden. Excessive time spent on a bank of sensitive items has been taken as an indication that respondents are having difficulty w/ the subject matter.
In attitude research time spent on an item can lead to polarization of item responses for some types of respondents.

I have support for undertaking this from our Manager of Emerging Technology and Engineering saying, "we ought to do this." Which is different from saying we have money for doing this. We do have a very senior developer reviewing the code to scope the task. I would be interested if others have looked at this and had general comments about the best way to approach this.

lognaturel commented 7 years ago

@ChrisCorey I think there is significant interest in this feature but I don't believe anyone has looked at the technical approach recently. Could you please ask your senior dev to jumpstart the technical conversation here or on the dev Slack at http://slack.opendatakit.org/? That will help bring it to the front of everyone's minds.

chrislrobert commented 7 years ago

FYI, SurveyCTO has this in the form of "text audits": meta-data on timing is attached to submission data, as separate .csv files. We also have a duration() function that dramatically simplifies more ad-hoc capturing of timing within a form (consistent across saves/resumes).

In principle, we're open to sharing some or all of this stuff -- but, as with most other extensions we've made, we did it in a non-XForms-compliant, independent, non-collaborative way. I have no doubt that there could be a million other designs proposed and considered, maybe 990,000 of which would fit better with the XForms spec.

On Wed, Jan 11, 2017 at 10:14 AM, Hélène Martin notifications@github.com wrote:

@ChrisCorey https://github.com/ChrisCorey I think there is significant interest in this feature but I don't believe anyone has looked at the technical approach recently. Could you please ask your senior dev to jumpstart the technical conversation here or on the dev Slack at http://slack.opendatakit.org/? That will help bring it to the front of everyone's minds.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/opendatakit/collect/issues/257#issuecomment-271894317, or mute the thread https://github.com/notifications/unsubscribe-auth/AIO0HmRwZGZyfwg3KbMh99-qev_BxGqTks5rRPHRgaJpZM4K3-Bs .

ChrisCorey commented 7 years ago

I’m at a point where I need to find internal funding to get more developer time for this so nothing is going to happen quickly. Developers are suggesting a very specific listener on swipe events to run the now() function. I was hoping for something to execute any function whenever encountered in a form – I’m not getting a lot of support for that as a project. From: Hélène Martin [mailto:notifications@github.com] Sent: Wednesday, January 11, 2017 7:14 AM To: opendatakit/collect Cc: Corey, Christopher; Mention Subject: Re: [opendatakit/collect] Time logging of questions (#257)

@ChrisCoreyhttps://github.com/ChrisCorey I think there is significant interest in this feature but I don't believe anyone has looked at the technical approach recently. Could you please ask your senior dev to jumpstart the technical conversation here or on the dev Slack at http://slack.opendatakit.org/? That will help bring it to the front of everyone's minds.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/opendatakit/collect/issues/257#issuecomment-271894317, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AXrfMGkBjm-o5tIZqF-TV4zUjcmEDt0_ks5rRPHSgaJpZM4K3-Bs.

This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.

lognaturel commented 7 years ago

@ChrisCorey Keep us posted. Hopefully someone will have the cycles to take this on soon.

@chrislrobert That's very good to know. Once someone is ready to take this on it would be good to see if it's possible to do some kind of coordination.

nap2000 commented 7 years ago

@mberg This sounds like a feature that is worth adding.

@chrislrobert It would make sense to me to consider leveraging off the work done by surveyCTO and presumably one of the objectives would be to make the odkCollect implementation compatible with the CTO solution where possible. However do you have any specific thoughts on how the XForms specification could be used to implement this? No need to list all 990,000.

Anyone have any thoughts on how we could use the XForms spec?

lognaturel commented 7 years ago

@MartijnR, would appreciate your thoughts on how to approach this in an XFormsy way.

MartijnR commented 7 years ago

XForm 'Actions' would be the XFormsy way to do this, I think. If one of the existing XForms events does not meet the requirements, we could create (a) custom event(s). Depends on exactly what it should measure. If may require two actions (start and end, just like the already supported metadata for overall duration). At first glance that makes the most sense to me.

In ODK we don't have Actions, but use an (fairly equivalent) custom feature: "preload items" (note: CommCare did replace preload items with Actions). So if there is no desire to already implement XForm Actions, a quick way of doing this would be to create another preload item (or two).

It think either of these options could probably be adopted (if we flesh them out further) in The Spec. [edit]One thing to figure out though, is how to link them with a particular page or question.[/edit]

chrislrobert commented 7 years ago

The XForms "actions" stuff is kind of Greek to me. But that aside, we might think first about what would make sense from a user perspective -- so, to me, one key question is: what data does the user want, at the end of the day, and what's the best format for that data? We store detailed timing data in a "file" field type so that it's essentially a media attachment on export, and that has advantages and disadvantages. It includes columns for the field, the seconds into administration when the field was first encountered, and the total seconds spent on the field. That choice of columns already encapsulates a bunch of trade-offs. For example, we recognize that people can visit a field multiple times, and we decided not really to case about the details of that; rather, we focused only on (a) when they first arrive and (b) the total time spent.

Having the data in an attachment has been a bit rough, but it's hard to imagine really great alternatives. Some have put together (and shared) Stata code to suck in and summarize timing information, flagging outliers and such. For ease of monitoring, we have a new function that allows you to review a full submission in a web browser (even if it's encrypted!) and overlay the timing info on the submission details, so you can just easily see how much time was spent, etc., when reviewing each submission.

Note that recording time consistently across multiple editing sessions involved its own challenges, and we hacked some attributes into saved form XML in order to keep a running clock going. That's definitely a concern to work out: how to deal with multiple editing sessions.

And then, of course, there's how the user would want to enable/disable this kind of feature. We added a new field type that is basically a file-type field in XForms, just handled differently in Collect.

So I guess I'd say that there are three broad pieces:

The UX at form-design time.
The UX at data-export time.
And the internal implementation (where I think XForms "actions" might come in).

I'd let #'s 1 and 2 guide #3, but that's just my approach. (Surely #'s 1 and 2 are relevant, though, even if #3 is more in the driver's seat.)

On Mon, Jan 23, 2017 at 1:34 PM, Martijn van de Rijdt < notifications@github.com> wrote:

XForm 'Actions' https://www.w3.org/TR/2006/REC-xforms-20060314/slice10.html would be the XFormsy way to do this, I think. If one of the existing XForms events does not meet the requirements, we could create (a) custom event(s). Depends on exactly what it should measure. If may require two actions (start and end, just like the already supported metadata for overall duration). At first glance that makes the most sense to me.

In ODK we don't have Actions, but use an (fairly equivalent) custom feature: "preload items" (note: CommCare did replace preload items with Actions). So if there is no desire to already implement XForm Actions, a quick way of doing this would be to create another preload item (or two).

It think either of these options could probably be adopted (if we flesh them out further) in The Spec.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/opendatakit/collect/issues/257#issuecomment-274576030, or mute the thread https://github.com/notifications/unsubscribe-auth/AIO0HqcCxiBYOP2MY0wDLBnbrKBiHTp3ks5rVPKsgaJpZM4K3-Bs .

nap2000 commented 7 years ago

@MartijnR Do you think you would implement support for timing in Enketo? And related to that @chrislrobert in your solution how did you calculate timings for a page of questions (field-list / table-list)?

chrislrobert commented 7 years ago

We punted on field-list groups: I believe that all questions on a screen get the same times.

On Jan 24, 2017 12:34 AM, "Neil Penman" notifications@github.com wrote:

@MartijnR https://github.com/MartijnR Do you think you would implement support for timing in Enketo? And related to that @chrislrobert https://github.com/chrislrobert in your solution how did you calculate timings for a page of questions (field-list / table-list)?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/opendatakit/collect/issues/257#issuecomment-274714181, or mute the thread https://github.com/notifications/unsubscribe-auth/AIO0HvQYrOzEV4QnDf-jw4rIcIAjFL2Aks5rVY1SgaJpZM4K3-Bs .

MartijnR commented 7 years ago

@MartijnR Do you think you would implement support for timing in Enketo?

Only if we can figure out how to do this in a solid manner, and if a sponsor pushes this. (It would be for 'pages mode' only.) It has not entered our roadmap so far.

Chris brings up many good points. I figured "page-flip-start" and "page-flip-end" events would be the core of this feature, but indeed it's much more complex than that if you look at editing draft records and users flipping back to a previous page.

A lot of this complexity seems specific to the finer details of how the data collection client has implemented the UI around the forms (the stuff that is not described in the spec). I'm starting to wonder if for that reason this is maybe better done outside of XForms. Or we could use a hybrid option with some interoperability potential where we simply agree on adding a new meta element (e.g. orx:meta/orx:timing) to the spec with a binary datatype (i.e. an attachment), optionally with a predefined format, and leave the actual implementation of populating that file to the client (i.e. "magic"). It sounds like this would then be almost what SurveyCTO is doing. The presence of this meta element could signal to the client to enable the audit feature (in conjunction with a setting on the app perhaps) or show a 'feature not supported' warning.

For the hybrid option with an agreed format, we wouldn't have the ideal interoperability as timing data from submissions by both Enketo and ODK Collect for the same survey cannot be reliably combined but at least a server would be able to process timing data submitted by both.

[edited]

nap2000 commented 7 years ago

I will be happy to put together a solution for this feature request.

@MartijnR I like your idea of implementing a hybrid approach with the timing component being implemented in the collect or other client. However I'm still keeping an open mind on the solution.

The following link proposal doc is to a document where I have consolidated some of the ideas already posted here plus some more thoughts. Again no solution is intended as yet. Please add, critique or rule out options, assumptions etc directly in the document or, if you post directly to this issue, I will update the document on your behalf.

yanokwa commented 7 years ago

@nap2000 Could you please add commenting privileges to whomever has the link to that document?

nap2000 commented 7 years ago

I have updated the link to allow commenting (hopefully).

yanokwa commented 7 years ago

@joeflack4 Your fork of Collect has timing in a sidecar file right? Any regrets on that approach?

joeflack4 commented 7 years ago

@yanokwa I would have to double check, but yes, I believe the logs are stored as space/tab delimited lines in a plain text file.

This was before I started here. No regrets yet, though plain text approaches tend to get iterated on eventually. I believe right now we're parsing them in Python, after an earlier attempt with R. This is something that James can elaborate more on.

yanokwa commented 7 years ago

It'd be good to understand what kinds of things you track and of those, what are actually useful. Also, do you submit those files to a server or is it pulled off the SD card.

joeflack4 commented 7 years ago

Definitely. I hope we can allocate some of our resources to furthering ODK, as it looks like we will be sticking with the platform for awhile. I know very little about Collect's codebase or even our innovations. We collect quite a lot of this log data, so Im assuming server. However I do not believe we've modified aggregate. I'll confer this week and get back with some details.

yanokwa commented 7 years ago

@ChrisCorey I want to make sure we aren't forgetting you! Be sure to review Neil's proposal doc so we capture your use case!

joeflack4 commented 7 years ago

@yanokwa I went ahead and looked into how our logs are submitted on our JHU fork. They are submitted as flat text file attachments to ODK aggregate. James and I have our hands full this week, but we can speak further on this topic and others soon.

nap2000 commented 7 years ago

@chrislrobert Do you have any comments or suggestions on the proposal for adding collection of timing information to the odkCollect base?

lognaturel commented 7 years ago

The one high-level thing I'd really like to hear from @mberg, @ChrisCorey and anyone else who has a need for this feature is whether you believe that total time spent per question is sufficient information. It sounds like it's been good enough for SurveyCTO's users and I believe it would be enough to satisfy @ChrisCorey's stated needs.

An alternate option would be to log a specific set of events such as "swipe left", "swipe right", "value change". This would allow for somewhat richer analysis and a deeper understanding of what actually happened. For example, if an enumerator spent a total of 10 minutes in a question and entered it twice, is it because s/he quickly skipped it the first time and then went back to it and then spent a long time in it, or...?

@benb111's dissertation "Algorithmic Approaches to Detecting Interviewer Fabrication in Surveys" is really interesting and perhaps relevant to this conversation. He added some event logging to ODK (in a way that we unfortunately can't use) and found that he could use these to detect falsified data. His algorithms for doing so are implemented here. Section 5.4 describes the events he logged and 5.5 the aggregate values he computed from those events. In Section 6.3, he says:

Given that user-trace metadata can help, one could ask how much detail needs to be recorded in the traces to make the most effective predictions. One possibility is that it is sufficient to record just the time spent on each question. If this were true, it would mean simpler implementations and smaller log files. Thus, it is important to justify the increased complexity that is required to record more detailed user trace logs, with entries for events like edits. I argue here that this level of detail really does help.

Of course, a lot of what he did is not practical to generalize but I think it still provides some interesting insights into what could eventually be done with this kind of logged data. The big disadvantage to event logging is that it would need to be processed to provide any real value whereas total time data might be of some use on its own (someone could skim total time spent across many instances and spot outliers).

I want to make sure we've at least considered this as an option.

cc @aflaxman since he reviewed this work.

lognaturel commented 7 years ago

Um. Plot twist. Ben's logging code was actually added in here... sigh.

lognaturel commented 7 years ago

To be clear, the logging code I reference above is undocumented and does not include any way for getting the data off the phone (you need to manually copy a db file off the phones). We still have to design and implement a solution, I just didn't realize that code was in trunk until I searched for it.

If you want to try it out, I found a message from @mitchellsundt with brief instructions here.

Logging is enabled if the file "/sdcard/odk/log/enabled" exists. The logging database will be "/sdcard/odk/log/activityLog.db"

chrislrobert commented 7 years ago

Apologies, I haven't had time to review the proposal in detail or give feedback. I will just say that yes, total time spent per question has been sufficient for most of our users, but recently somebody suggested adding "times visited" as well. The key problem has been that the volume of data is already too much for most people most of the time. We have other features ("speed limits", "audio audits", and the ability to reliably capture timestamps in an ad-hoc way, at key points in the survey) that have tended to result in a more manageable amount of data for most users. A handful of expert users have used Stata to pull in all timing data and conduct various forms of analysis on it, but that's complicated; even when they/we share the code, most people are put off by its complexity. Only recently, by adding the timing data as an overlay in our Data Explorer, has the timing data become truly accessible and usable by the majority of our users; now, when they monitor data and drill down into a submission, they can easily see the timing data overlaid over the responses. Thus, much depends on the manner in which people will actually use the timing data. If there's some simplified way for people to use it, then behind the scenes maybe it can be more complex and voluminous. But if the idea is to have people directly use the data themselves, then I think there is a lot to be said for simplicity.

On Wed, Feb 8, 2017 at 7:08 AM, Hélène Martin notifications@github.com wrote:

To be clear, the logging code I reference above is undocumented and does not include any way for getting the data off the phone (you need to manually copy a db file off the phones). We still have to design and implement a solution, I just didn't realize that code was in trunk until I searched for it.

If you want to try it out, I found a message from @mitchellsundt https://github.com/mitchellsundt with brief instructions here https://groups.google.com/d/msg/opendatakit/OA4xR6HvCkw/7ihXGE1x7GsJ.

Logging is enabled if the file "/sdcard/odk/log/enabled" exists. The logging database will be "/sdcard/odk/log/activityLog.db"

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/opendatakit/collect/issues/257#issuecomment-278311223, or mute the thread https://github.com/notifications/unsubscribe-auth/AIO0Hni_11uyOcGAw4l_qIs2kFNETy_Bks5rabAkgaJpZM4K3-Bs .

nap2000 commented 7 years ago

Thanks @chrislrobert its good hearing about your experience. It seems that most value from the timing data has happened since the server started doing some presentation and that few people used the raw data files? Can you provide an example CSV file that you expect to be submitted. Also how is this CSV file packed into the submission web request?

joeflack4 commented 7 years ago

@yanokwa Also, to answer your question as to what our logs look like / what kind of information it logs, here is a short example snippet.


1478095110545   oP  managing_authority[1]   
1478095121658   LH  available[1]    
1478095121678   oR  available[1]    
1478095123225   LP  available[1]    yes
1478095123282   EP  consent_start[1]    
1478095124368   LP  consent_start[1]    
1478095124381   EP  consent[1]  
1478095125902   LP  consent[1]  
1478095125915   EP  begin_interview[1]  
1478095127273   LP  begin_interview[1]  yes
1478095127352   EP  participant_signature[1]/sign[1]    
1478095127352   EP  participant_signature[1]/checkbox[1]    
1478095128700   LP  participant_signature[1]/sign[1]    
1478095128700   LP  participant_signature[1]/checkbox[1]    1
1478095128761   EP  witness_auto[1] 
1478095130065   LP  witness_auto[1] 1
1478095130093   EP  facility_name[1]    
1478095133027   LP  facility_name[1]    Sinoko-Dispensary
1478095133058   EP  MFL_number[1]   
1478095135002   LP  MFL_number[1]   6
1478095135025   EP  position[1] 
1478095136097   EH  position[1] 
1478095136121   oP  position[1] 
1478095164755   LH  facility_type[1]    nursing_maternity
1478095164795   oR  facility_type[1]    nursing_maternity
1478095166611   LP  facility_type[1]    pharmacy
1478095166643   EP  managing_authority[1]   
1478095167676   EH  managing_authority[1]   
1478095167700   oP  managing_authority[1]   
1478095182617   LH  managing_authority[1]   
1478095182636   oR  managing_authority[1]   
1478095184908   SF  managing_authority[1]   
1478095185610   oP  managing_authority[1]   ```

chrislrobert commented 7 years ago

TA_10fe0ca1-5c35-4b28-a46e-046d6ffce30f.csv.zip

No problem. Sorry not to be more helpful.

Attached is an example text audit file from a short web form of ours. We added the "text audit" field type which is a file/media field not very much unlike an image, video, etc. The .csv file is then attached as part of the submission in just the way that an image or video file would be attached. In most of our extensions to ODK, we tried to take a path of least resistance (thus, for example, adding a file-type field to hold the data rather than inventing some new structure for XML data).

(Attachment zipped b/c .csv not allowed.)

On Wed, Feb 8, 2017 at 9:12 AM, Neil Penman notifications@github.com wrote:

Thanks @chrislrobert https://github.com/chrislrobert its good hearing about your experience. It seems that most value from the timing data has happened since the server started doing some presentation and that few people used the raw data files? Can you provide an example CSV file that you expect to be submitted. Also how is this CSV file packed into the submission web request?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/opendatakit/collect/issues/257#issuecomment-278338985, or mute the thread https://github.com/notifications/unsubscribe-auth/AIO0HvivJezdwowKVWR2ZJLsONeSg7Mfks5rac1hgaJpZM4K3-Bs .

nap2000 commented 7 years ago

OK adding a type I guess its a preload that adds an attachment. I think this is a good approach. I can't see the attachment though, is it there?

lognaturel commented 7 years ago

@nap2000, the attachment is linked from the top of @chrislrobert's post. It looks like

Field name,Total duration (seconds),First appeared (seconds into survey) has_caseid[1]/contactinfo,7,3 has_caseid[1]/exercise_1,471,10

Sounds like the complexity of the log file doesn't really matter because in practice even the most minimal format requires processing to be useful and actionable. In other words, "users won't be able to use the data directly" could be an argument against logging more detailed info or increasing the complexity of the log file. But it's hard to make sense of even a simple log file and there will need to be a processing layer for this feature to be useful to a broad range of users no matter what the log file looks like.

nap2000 commented 7 years ago

Giving some thought to implementation.

There is an existing Logger, in org.odk.collect.android.database.ActivityLogger.

This was initially contributed as “Ben’s logging implementation” for logging user interactions and was originally named Logger. Was renamed by @mitchellsundt to ActivityLogger and extended to log additional application events such as “createDeleteInstancesDialog”.

It would seem to make sense to use the ActivityLogger for the user timing information, if only to keep all logging in a single class. Some changes could be:

Create a new logTimerEvent() method where the event would be written to the timer log file as well as the activity log? Alternatively the existing log parameters could be parsed to see if the event should be written to the timer log.
Manage the life cycle of timer log files associated with each instance
Add additional calls to log()timer events throughout the code as needed.

chrislrobert commented 7 years ago

Just to save you guys from some of the hassles we had when we did this: relying on the system clock leads to a series of issues in the field (particularly but not exclusively when saving+exiting and then resuming). The trouble is, Android dates and times change more often than would be desirable, including when you connect to cell service and the date/time is adjusted to match the network's. Thus, for the timing of first arrival on a question, it's useful to have a timer relative to the submission, which starts and stops whenever you enter and leave the form. We stashed the current state of the timer as an attribute of the top-level submission XML tag, then used that for keeping a consistent timer throughout the form...

On Sat, Feb 11, 2017 at 10:33 PM, Neil Penman notifications@github.com wrote:

Giving some thought to implementation.

There is an existing Logger, in org.odk.collect.android. database.ActivityLogger.

This was initially contributed as “Ben’s logging implementation” for logging user interactions and was originally named Logger. Was renamed by @mitchellsundt https://github.com/mitchellsundt to ActivityLogger and extended to log additional application events such as “createDeleteInstancesDialog”.

It would seem to make sense to use the ActivityLogger for the user timing information, if only to keep all logging in a single class. Some changes could be:

Create a new logTimerEvent() method where the event would be written to the timer log file as well as the activity log? Alternatively the existing log parameters could be parsed to see if the event should be written to the timer log.

Manage the life cycle of timer log files associated with each instance

Add additional calls to log()timer events throughout the code as needed.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/opendatakit/collect/issues/257#issuecomment-279194261, or mute the thread https://github.com/notifications/unsubscribe-auth/AIO0HsOJGco3uCZk9FLVgKiXBF5xaftIks5rbn2PgaJpZM4K3-Bs .

nap2000 commented 7 years ago

Thanks Chris very good to get your experience on this. Did you measure how much the time changed due to network variation? A couple of seconds here and there might be acceptable?

Otherwise we could adopt your approach, possibly storing the current state in the instance database as an alternative to the instance file.

In order to get timestamps that are accurate, relative to each other and within a form editing session, perhaps we could record the time stamp on form open, along with the elapsedRealTime() and then at each event add the delta in the elapsedRealTime() to the timestamp? If we did this maybe we don't need to store the state of the timer and we would wear any wall clock time variation of the user was to stop and then restart the editing session after shutting down their phone.

chrislrobert commented 7 years ago

Well, with thousands of enumerators in the field with batteries failing and settings being reset from time to time, a non-trivial number will have date/time shifting by even a year, a month, a day... so if you rely on device date/time, you'll just have to live with a certain number of totally wacky measurements.

On Sun, Feb 12, 2017 at 9:09 AM, Neil Penman notifications@github.com wrote:

Thanks Chris very good to get your experience on this. Did you measure how much the time changed due to network variation? A couple of seconds here and there might be acceptable?

Otherwise we could adopt your approach, possibly storing the current state in the instance database as an alternative to the instance file.

In order to get timestamps that are accurate, relative to each other and within a form editing session, perhaps we could record the time stamp on form open, along with the elapsedRealTime() and then at each event add the delta in the elapsedRealTime() to the timestamp? If we did this maybe we don't need to store the state of the timer and we would wear any wall clock time variation of the user was to stop and then restart the editing session after shutting down their phone.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/opendatakit/collect/issues/257#issuecomment-279221005, or mute the thread https://github.com/notifications/unsubscribe-auth/AIO0Hq0EnUQkCQv63_KdrugyB5kA89_zks5rbxK0gaJpZM4K3-Bs .

nap2000 commented 7 years ago

Yes I understand. We have this problem know when showing time to complete a survey. It just takes one enumerator to get to the end of the survey and not save as finalised to make the averages meaningless. The server can always eliminate outliers but it would be good to address at source.

I'm still keen to record real times. For example if settings are reset and then the enumerator continues the next day I'd like the system to attempt to record that, it seems your approach will not record that?

yanokwa commented 7 years ago

Given all this wackiness, I'm leaning more and more towards the logging solution. At least a human being (or clever script) can look at the raw data and try to make sense of it.

Agreed that a timestamp plus elapsedRealtime is a pretty good place to start. Another option we could add is an occasional [GPS time](https://developer.android.com/reference/android/location/Location.html#getTime()).

nap2000 commented 7 years ago

My apologies for being absent for a few days. I agree that an event log looks to be the best solution. It would be readable by a human but much better to have some scripting support to interpret and present results. Given the potential value in this data adding such a script on the server will be well worth while. A different project could also add a view a more personal view of the timing data on the phone.

I would be happy to move to developing a prototype of this feature by making changes to the ActivityLogger class. Unless anyone thinks there is a better way to approach the solution.

lognaturel commented 7 years ago

Starting with the ActivityLogger class sounds reasonable but I don't think we should feel any attachment to it if it's not meeting our needs or not well designed (I haven't looked at it in detail yet). It would be totally reasonable to design a parallel system and then take out ActivityLogger and related functionality. I know that it is used somewhat because it results in a couple of NullPointerExceptions that we see in the Google Play dev console occasionally but I can't imagine anyone wanting to use it once this feature is in place and documented.

I think it's worth spending some extra time to design the system well and to put some ✨testing✨ in place. @nap2000, consider running things by #collect-code in Slack as you're building if you want another brain or two.

nap2000 commented 7 years ago

OK thanks Hélène,

That is the approach I will take then. Start with Activity Logger and if it seems right create a TimerLogger class instead.

What do you mean by testing? A test plan or is this automated testing?

lognaturel commented 7 years ago

392 had been on my mind so I think the ideal would be some automated testing as appropriate of things like turning logging on and off, making sure the file has the intended structure, the file is submitted and things like that. This should hopefully encourage a more modular design that will be easier to change and more robust.

This is new for this project so we're all learning together and it will take more time but if you're up for some experimenting with this feature, I think there will be lots of benefits.

nap2000 commented 7 years ago

ok I will add a test plan to the document and look to include some early unit tests.

ChrisCorey commented 7 years ago

All

Sorry I have dropped out of this thread for so long – Annual Review has just wrapped up. The direction this has taken seems good to me. Many thanks to everyone.

Chris Corey

From: Neil Penman [mailto:notifications@github.com] Sent: Sunday, February 19, 2017 10:40 PM To: opendatakit/collect Cc: Corey, Christopher; Mention Subject: Re: [opendatakit/collect] Time logging of questions (#257)

ok I will add a test plan to the document and look to include some early unit tests.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/opendatakit/collect/issues/257#issuecomment-280999884, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AXrfMI7tnwIab7PkFov-I16z3jZsFiGRks5reTVKgaJpZM4K3-Bs.

This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.

nap2000 commented 7 years ago

Additional design sections have been added to the google doc that summarises the work on this feature request. https://docs.google.com/document/d/1LqVlVpePjA7Q1snjhA_ZQoDuzFqbtQcIygwcEzO_LJs/edit?usp=sharing

Feel free to provide comments on the approach in the document or here on github.

nap2000 commented 7 years ago

Its probably time to move to close off the design stage and get feedback on what should be included in the final build.

The design is in https://docs.google.com/document/d/1LqVlVpePjA7Q1snjhA_ZQoDuzFqbtQcIygwcEzO_LJs/edit?usp=sharing. Final comments please!

I have created a pull request containing a prototype for the implementation of this issue. https://github.com/opendatakit/collect/pull/760 Please also review and provide feedback. The pull request is only a prototype so feel free to suggest a completely different approach to the code if you think it justified.

The prototype does work. You can enable logging in the general preferences after which a log file will be created when you open or re-open a form. The log will be called timing.csv and will be sent to the server when you submit the finalised form.

jkpr commented 7 years ago

Sorry to jump in late, @nap2000 @lognaturel. We at PMA2020 have been using something very similar now for slightly more than a year, and we have found it very useful for tracking enumerator times. We make a log that has timestamped events. @joeflack4 commented on it earlier in the thread, but there wasn't much followup. It seems you are pretty far down your own development path, so I will link to a few of our files and explain how they work. Maybe something will be useful for you.

We make use of Handler, HandlerThread, and Looper to create a background thread that handles file writing. This is tied to the FormEntryActivity lifecycle, see onCreate and onDestroy.

The bulk of the timer logging code is in the UseLog class. It creates a file, writes to a buffer, and flushes to file periodically. We track a few events, they are listed here in the UseLogContact class. They should be self-explanatory for the most part. In the FormEntryActivity we generate Messages which consist of a code for the event, timestamp, current node, and current value stored in the view (examples here, here, and here. Then post to the Handler and the Handler writes to file in the background.

An important decision, in my opinion, was to write the file as a tab separated file. Since we include the value of the view/xpath node in the log, I thought commas would be more common than tabs (consider text entry question). Therefore, the TSV file was a more appropriate format than CSV for us.

I noticed you are tracking a few things that we are not, such as language change. That would be interesting for us to add. Also, it seems like you are trying to track GPS as the log progresses. I know our team would be very interested in that. We will keep an eye on your development to see if there is any inspiration we can gain.

nap2000 commented 7 years ago

Hi @jkpr. I had a look at your code which I'm not going to think much about at the moment. Someone else might want to suggest if we should incorporate that into the solution.

I've also had another look at your output file which I should have looked at more thoroughly before.

What do people think about showing duration inside a question within a single line or having two events EP and LP? The draft code is generating csv files that look like this:

timing

Hence showing the start and end time of when the user is in a prompt on the same line. However it may make sense to create two separate events as you have done.

I'm not convinced of the value of including the response entered by a user as the server can readily combine the data for analysis if required.
If there was a possibility of commas in the data we could add quotation marks around the entries. Anyone else prefer tab delimited?
It looks like your "on pause" events and "on resume events" are generated by the android activity life cycle? The "resume" event I am recording happens when the user saves a survey and then starts editing the saved survey. Recording the activity pause and resume looks like a good idea I will add that.

lognaturel commented 7 years ago

@nap2000 This is all looking very awesome to me! And thanks @jkpr for sharing your implementation.

@nap2000, in 1, am I understanding correctly that EP is something like enter prompt and LP is leave prompt? I don't really have a strong feeling either way. I thought it would be slightly simpler to log entry and exit events separately but you seem to have managed a good implementation with both in the same row. It seems having them on the same row is marginally more human friendly.

2) I think we should not include the response. Including it makes the timing file potentially more sensitive and as you say, it doesn't add any new information.

3) I prefer commas but don't feel strongly.

4) 👍

lognaturel commented 7 years ago

@mberg @ChrisCorey @MartijnR @yanokwa Any last comments to make on the general approach or the prototype at #760 before @nap2000 builds The Real Deal?

ChrisCorey commented 7 years ago

Everything I’ve seen so far seems great. Regarding the issues that have been raised:

I agree that including responses does not add new information and can be omitted from the file. Would the current logging feature be retained and provide the added detail if someone wanted all form event?

I have no opinion about the field separator

It shouldn’t make a difference, but I’m drawn to the example that shows ‘Start’ and ‘End’ times. I think in the end that format will be accessible to a wider range of users.

Chris

Christopher R. Corey Manager of Technical Services RAND Survey Research Group (310) 393-0411 x7505

From: Hélène Martin [mailto:notifications@github.com] Sent: Tuesday, March 28, 2017 4:41 PM To: opendatakit/collect Cc: Corey, Christopher; Mention Subject: Re: [opendatakit/collect] Time logging of questions (#257)

@mberghttps://github.com/mberg @ChrisCoreyhttps://github.com/ChrisCorey @MartijnRhttps://github.com/MartijnR @yanokwahttps://github.com/yanokwa Any last comments to make on the general approach or the prototype at #760https://github.com/opendatakit/collect/pull/760 before @nap2000https://github.com/nap2000 builds The Real Deal?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/opendatakit/collect/issues/257#issuecomment-289938104, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AXrfMHdH-AQueHSLzV3C9C9WNT6ZwsZsks5rqZqagaJpZM4K3-Bs.

This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.

getodk / collect

Time logging of questions #257