FLOIP / flow-results

Open specification for the exchange of "Results" data generated by mobile platforms using the "Flow" paradigm
6 stars 2 forks source link

Represent engagement with "informational messages" in Flow Results #43

Closed markboots closed 2 years ago

markboots commented 2 years ago

Problem Statement Flow Results was initially developed to share "interaction data" between systems that request, run, and archive the results of Flows, along with the semantic meaning of those interactions. It uses terminology of "Questions" and "Responses", inspired by use-cases in field data collection. It represents these use-cases very well, including flow-based surveys, task guidance for field workers, and even representing data collected by pre-flow systems such as ODK.

However, Flows are also used extensively for informational messaging to remote recipients. It is important to know whether recipients have received/consumed informational content. All Flow-based systems (RapidPro, Viamo, Turn.io threads, Verboice, Flow Standard, etc.) have "Message" actions or blocks. We need an ability to indicate that a Contact has received (SMS, text, social messaging) or listened to (IVR) a Message.

Proposed Solution

@rudigiesler, @smn, I assume this is also relevant for Turn.io; any thoughts?

rudigiesler commented 2 years ago

Hi Mark! Interesting proposal, we've previously been using other data stores for this purpose, would be interesting to see how this could integrate into flow results. A few comments from my side:

markboots commented 2 years ago

Hi @rudigiesler , thanks for the comments - really helpful! It would be great to iterate on this to see if it could become a "complete" solution for the use-cases you're seeing.

I think things like read receipts are prevalent across enough channels that we want something about it in the spec. I don't disagree about it being in the Response Metadata, but I think it would be helpful to have a bit of structure there, so that we don't end up with the case eg. one service calling it read, another read_reciept. But having unstructured data for channel-specific metadata is also very useful.

The PR suggested a structure in Response Metadata of delivery_status being one: SENT (dispatched to device), DELIVERED (received on the device), or CONSUMED (read or listened to by the recipient).

Given how universal delivery status/read receipts are: should we consider moving it from Response Metadata into the response value itself? That could avoid the need for Response Metadata, and make it more standard. Is there any way to encode that within the response value, while keeping the ability to measure partial completion on audio channels?

I know timestamps are often important, eg. when was it sent, when was it delivered, when was it read, to be able to answer questions like "what is the median time between receiving and reading a message"

Here we probably get into Response Metadata, since there are multiple parameters describing aspects of the "response". Should we propose some standard (but optional) keys for these in Response Metadata? e.g.: sent_at, delivered_at, consumed_at?

This is more focused on the API side, but currently it's an append-only service. How do we deal with relating that to delivery and read receipts, which are often only received much later than when the message is sent. Does the upstream service hold onto them for a certain period of time before submitting? What if a read receipt is received a week later? Do we allow updating of rows? And in that case, how to we ensure that the data is correct if updates come in out of order? eg. if the read receipt comes to the system first, then the delivered, how do we make sure the "read" status doesn't get overwritten?

That's a tough one. For Flow Results systems that are append-only: I assume the only way around this is to delay data entry/data transmission of the responses for a reasonable period of time like you said, until delivery_status settles. Any other magic that would work here? If responses were logged as soon as they were "Sent", we would need some out-of-band channel to update their delivery_status. That seems like it's getting beyond the scope of Flow Results and into being a messaging system. Thoughts?

How do we deal with the same piece of content being delivered to the same user multiple times? eg. if there's a piece of content that's updated daily with the number of COVID-19 infections, a user might access that content every day to see the updated numbers. Do we want to have "versions" of the content, or treat the updates as separate pieces of content.

My opinion would be that it's up to implementing systems to decide if this piece of content remains the same "Question ID" (identity from a Flow Results perspective), or if it becomes a new Question ID when it gets updated. If you decide it remains the same "Question ID", you could put some details into Response Metadata to indicate versions of content.

For a broader question on Flow Results versioning across updates to flows, see: https://floip.gitbook.io/flow-results-specification/specification#results-versioning

Do things like the session ID still make sense here?

For our use-cases session IDs still make sense, since a "message" interaction can be embedded within a larger session that comprises multiple messages and other kinds of "questions". How do you see it?

rudigiesler commented 2 years ago

Hi @rudigiesler , thanks for the comments - really helpful! It would be great to iterate on this to see if it could become a "complete" solution for the use-cases you're seeing.

I think things like read receipts are prevalent across enough channels that we want something about it in the spec. I don't disagree about it being in the Response Metadata, but I think it would be helpful to have a bit of structure there, so that we don't end up with the case eg. one service calling it read, another read_reciept. But having unstructured data for channel-specific metadata is also very useful.

The PR suggested a structure in Response Metadata of delivery_status being one: SENT (dispatched to device), DELIVERED (received on the device), or CONSUMED (read or listened to by the recipient).

I see, I missed that. That seems like a great solution, my only suggestion would be around failure modes, so something that's the opposite of sent (dispatch to service failed), and something that's the opposite of delivered (service failed to deliver to device). I don't think it makes sense to have the opposite of consumed.

Given how universal delivery status/read receipts are: should we consider moving it from Response Metadata into the response value itself? That could avoid the need for Response Metadata, and make it more standard. Is there any way to encode that within the response value, while keeping the ability to measure partial completion on audio channels?

The only way I can think of is to either have the response be an object, or to assign meaning to other numbers, eg "2" for delivered and "3" for read, but then it doesn't make sense for voice anymore, because "1" fully listened to means that it has been delivered and consumed. I quite like it in the metadata, as long as it's standardised across channels.

I know timestamps are often important, eg. when was it sent, when was it delivered, when was it read, to be able to answer questions like "what is the median time between receiving and reading a message"

Here we probably get into Response Metadata, since there are multiple parameters describing aspects of the "response". Should we propose some standard (but optional) keys for these in Response Metadata? e.g.: sent_at, delivered_at, consumed_at?

I like this. Just whatever the status is, add an _at, and that's the timestamp key.

This is more focused on the API side, but currently it's an append-only service. How do we deal with relating that to delivery and read receipts, which are often only received much later than when the message is sent. Does the upstream service hold onto them for a certain period of time before submitting? What if a read receipt is received a week later? Do we allow updating of rows? And in that case, how to we ensure that the data is correct if updates come in out of order? eg. if the read receipt comes to the system first, then the delivered, how do we make sure the "read" status doesn't get overwritten?

That's a tough one. For Flow Results systems that are append-only: I assume the only way around this is to delay data entry/data transmission of the responses for a reasonable period of time like you said, until delivery_status settles. Any other magic that would work here? If responses were logged as soon as they were "Sent", we would need some out-of-band channel to update their delivery_status. That seems like it's getting beyond the scope of Flow Results and into being a messaging system. Thoughts?

Maybe this is just me misinterpreting the API spec? With my current implementation, if there's an existing row, then it returns a 400 response. But maybe instead we should be replacing or updating the row? I think we will need to define what the behaviour should be though, eg. for metadata if you have {delivery_status: SENT, sent_at: xxx}, and then you make a second request with {delivery_status: DELIVERED, delivered_at: yyy}, then you probably want the resulting record to be {delivery_status: DELIVERED, sent_at: xxx, delivered_at: yyy}. But then you have the issue of what if these get sent out of order? Should there be some logic around delivery_status to only allow it to go SENT -> DELIVERED, but not the other way around? Or do something with timestamps? Or pass the responsibility to the client to ensure that they arrive ordered?

I don't quite like delaying the data, as often systems allow weeks for those status updates to come in, and you don't want to delay data that much.

How do we deal with the same piece of content being delivered to the same user multiple times? eg. if there's a piece of content that's updated daily with the number of COVID-19 infections, a user might access that content every day to see the updated numbers. Do we want to have "versions" of the content, or treat the updates as separate pieces of content.

My opinion would be that it's up to implementing systems to decide if this piece of content remains the same "Question ID" (identity from a Flow Results perspective), or if it becomes a new Question ID when it gets updated. If you decide it remains the same "Question ID", you could put some details into Response Metadata to indicate versions of content.

For a broader question on Flow Results versioning across updates to flows, see: https://floip.gitbook.io/flow-results-specification/specification#results-versioning

👍

Do things like the session ID still make sense here?

For our use-cases session IDs still make sense, since a "message" interaction can be embedded within a larger session that comprises multiple messages and other kinds of "questions". How do you see it?

I was thinking for the "bulk send" case, where a session is just a single message. But I see the relevancy of keeping it for other use cases.

markboots commented 2 years ago

Thanks @rudigiesler ! I updated the PR based on these suggestions on error codes and timestamps; let me know if this looks good?

rudigiesler commented 2 years ago

@markboots 👍 Looks good to me