Real world scenarios - Githubissues

Archieversace commented 10 years ago

The spec states that statements are immutable. However we have encountered some real world scenarios where this rule does not present obvious solutions.

Scenario 1: A user's Inverse Functional Identifier changes. For example they get married and change their mbox, or an mbox was incorrect.

Scenario 2: a user wishes to delete all of their data from a LRS, or for data protection we need to delete their data after x years from the LRS.

Has any thought been put into these scenarios? If so do the solutions need to be included in the specification or a wiki somewhere?

Apologies if these have already been answered elsewhere.

garemoko commented 10 years ago

Hi Jonathan, We certainly discussed scenario 1, and I think scenario 2. here's what I remember from the discussions:

In scenario 1a - the user changes their email address, the statement is still true but we're saying that jane.smith at the point in time is now jane.bloggs. The LRS doesn't care, but the reporting tool needs to know that jane.smith is now jane.bloggs. There's no specific mechanism for Jane telling the reporting tool this - it can be done in any way without need for standardisation/interoperability.

In scenario 1b there was an error in the original statement. I'd say that's a case for voiding and re-issuing the statement, but interested to hear from @fugu13 and others on this.

In scenario 2 - my understanding is that the LRS is free to tidy up and delete old statements and even only keep statements it's interested in (perhaps filtering by verb or activity type for example). I can't find this in the specification anywhere though, so perhaps something we need to address.

Andrew

brianjmiller commented 10 years ago

I :+1: @garemoko comments in 1a. Though would also go so far as to say that voiding could work in that case as well, though the purist in me thinks you should avoid it if at all possible. 1b would definitely be a case for voiding + resubmission with a corrected Agent. See the spec on voiding:

The certainty that an LRS has an accurate and complete collection of data is guaranteed by the fact that Statements cannot be logically changed or deleted. This immutability of Statements is a key factor in enabling the distributed nature of Experience API.

However, not all Statements are perpetually valid once they have been issued. Mistakes or other factors could require that a previously made Statement is marked as invalid. This is called "voiding a Statement" and the reserved Verb “http://adlnet.gov/expapi/verbs/voided" is used for this purpose. Any Statement that voids another cannot itself be voided.

In 2 my purist self would disagree, see first line of quote above. Though even that leaves the door open because of words like "accurate", "complete", and "logically". You could have an "accurate complete logical" set of statements for a particular experience without having all the other statements ever seen by the LRS. Voiding a statement isn't a great means of determining that a statement should be deleted because the original statement should still be made available via the /statements resource using voidedStatementId. Though perhaps better than other arbitrary measures such as timestamp or stored (favoring the latter). So where does that leave us? With needing specifics in the spec.

My practical side says this will be necessary, and even Rustici's Cloud provides a "sandbox" LRS that comes with its own "Delete" button that clears not only the statement stream but also any documents stored for the given endpoint. Though we don't make it available on non-sandbox endpoints. So if we feel passionately about the extent to which an LRS must store statements then I think the spec needs to be beefed up, if we don't feel passionately then I'd vote against adding any language because it'd be hard to get right.

andyjohnson commented 10 years ago

Hey Jonathan, thanks for the question - I'll weigh in as well

1a - Agree with Andrew, this would give the LRS too much power. This goes beyond error handling, and, for cases where the historical perspective of "which persona" did an activity matters, we would lose this information changing statements as we couldn't change them back.

1b - If action is required, voiding and re-issuing is the way to go. I think we need to think of the following scenario - I take a MOOC, but realize partway through that I'm not actually logged in as me, just a guest account. Do we expect a guest3894310102 personna to exist for this single set of experiences, or should it roll into my other MOOC taking experiences? I think we'd all agree that the LRS doesn't decide, but the LRS would allow an outside system the ability to void and re-issue statements, which I think is unavoidable.

2 - While statements are immutable, I don't think we ever say they are indestructible. If we accept the case where statements can be deleted from an LRS (say, personal financial data), I'd say this is pretty much the same issue as 1b, but asks the question: Does a system that issued statements have the ability to request deletion of them from an LRS without giving a reason? (not that the LRS would ever be in a position to judge a reason). We probably need to add some sort of Statement lifecycle language in the spec at some point.

On Thu, Feb 6, 2014 at 8:13 AM, Brian J. Miller notifications@github.comwrote:

I [image: :+1:]@garemoko https://github.com/garemoko comments in 1a. Though would also go so far as to say that voiding could work in that case as well, though the purist in me thinks you should avoid it if at all possible. 1b would definitely be a case for voiding + resubmission with a corrected Agent. See the spec on voiding:

The certainty that an LRS has an accurate and complete collection of data is guaranteed by the fact that Statements cannot be logically changed or deleted. This immutability of Statements is a key factor in enabling the distributed nature of Experience API.

However, not all Statements are perpetually valid once they have been issued. Mistakes or other factors could require that a previously made Statement is marked as invalid. This is called "voiding a Statement" and the reserved Verb "http://adlnet.gov/expapi/verbs/voided" is used for this purpose. Any Statement that voids another cannot itself be voided.

In 2 my purist self would disagree, see first line of quote above. Though even that leaves the door open because of words like "accurate", "complete", and "logically". You could have an "accurate complete logical" set of statements for a particular experience without having all the other statements ever seen by the LRS. Voiding a statement isn't a great means of determining that a statement should be deleted because the original statement should still be made available via the /statements resource using voidedStatementId. Though perhaps better than other arbitrary measures such as timestamp or stored (favoring the latter). So where does that leave us? With needing specifics in the spec.

My practical side says this will be necessary, and even Rustici's Cloud provides a "sandbox" LRS that comes with its own "Delete" button that clears not only the statement stream but also any documents stored for the given endpoint. Though we don't make it available on non-sandbox endpoints. So if we feel passionately about the extent to which an LRS must store statements then I think the spec needs to be beefed up, if we don't feel passionately then I'd vote against adding any language because it'd be hard to get right.

Reply to this email directly or view it on GitHubhttps://github.com/adlnet/xAPI-Spec/issues/442#issuecomment-34321320 .

Andy Johnson ADL Technical Team 608-318-0049

Archieversace commented 10 years ago

Thanks for the responses, some really good info there.

Scenario: 1a and 1b sounds like there are options within the spec to cover those scenarios. It would be helpful to include a scenarios section somewhere in the spec and offer some best practice approaches around things like this.

Scenario 2 From my point of view I think there needs to be a way for a user to remove their own data from an LRS. Data ownership is a hot topic (Facebook, twitter instagram etc) and I think for users to be able to believe (and trust) in the idea of an LRS then they will want the facility to remove all their data from it (they own it after all). So if we agree that this is a requirement then there should be a mechanism in the spec to cover this, otherwise we will have vendors implement something outside the spec to cover this off.

One way to resolve this would be to add a "delete all data" end point that deleted all data based on the actor provided. This then does not interfere with voiding, immutability or the desire for logical statements etc not to be changed. Any other thoughts?

andyjohnson commented 10 years ago

Of course the one bad thing about "delete all data" is any sort of hack could leave one "statementless" :)

Not sure we want that on our collective conscious lol

On Fri, Feb 7, 2014 at 9:58 AM, Jonathan Archibald <notifications@github.com

wrote:

Thanks for the responses, some really good info there.

Scenario: 1a and 1b there are options within the spec to cover those scenarios. It would be helpful to include a scenarios section somewhere in the spec and offer some best practice approaches around things like this.

Scenario 2 From my point of view I think there needs to be a way for a user to remove their own data from an LRS. Data ownership is a hot topic (Facebook, twitter instagram etc) and I think for users to be able to believe (and trust) in the idea of an LRS then they will want the facility to remove all their data from it (they own it after all). So if we agree that this is a requirement then there should be a mechanism in the spec to cover this, otherwise we will have vendors implement something outside the spec to cover this off.

One way to resolve this would be to add a "delete all data" end point that deleted all data based on the actor provided. This then does not interfere with voiding, immutability or the desire for logical statements etc not to be changed. Any other thoughts?

Reply to this email directly or view it on GitHubhttps://github.com/adlnet/xAPI-Spec/issues/442#issuecomment-34443264 .

Andy Johnson ADL Technical Team 608-318-0049

brianjmiller commented 10 years ago

In so far as data ownership is a hot topic I don't think it is possible to say that the user "owns" the data. Internal corporate training organizations would likely balk at that statement, hence the debate.

There is also a very real problem of implementation of "delete data" on some arbitrary property. Even if you take an Agent, does it apply to just statements where they were the "actor"? Statements where they are part of a Group in the "actor"? Statements where they are the "actor" or "object"? How about as part of a SubStatement and/or StatementRef? If you delete a statement that uses a StatementRef (outside of the void case) do you delete the corresponding statement? The better option would likely be "authority" but that comes with its own can of worms. There is also the problem of LRS to LRS transfers, getting rid of a statement may not be so simple, that is why the voiding is done using a statement rather than just an HTTP method. And I haven't even gotten into backups, offline backups, and offsite backups. Is anything really ever deleted anymore? This will be very hard to spec "right".

brianjmiller commented 10 years ago

I should have also commented on the "real" world scenarios. I definitely think the spec is the wrong place for that. Blog posts, example apps, code snippets, mailing lists, even github issues, etc. can all do a much better job of that, will be more fluid, allow commenting, etc. They would bloat the spec and be very hard to maintain overtime. Just my $.02.

aaronesilvers commented 10 years ago

(cough)

Sorry, I couldn’t help but interject here.

I would prefer that we not spec this at all, because no matter how we do it, it’s going to be way wrong.

And, as a point of conjecture, the spec was conceived with the notion that a user can own their data. I understand that organizations may have something to say about that, and not only will there be some interesting perspectives on what it means to own data, there will also be questions about which data is “theirs” when one owns “their” data…

It is entirely possible to say that the user “owns” the data. There. I just said it, Brian. I can say it again, too.

My point being that it’s not a messy enough problem yet. I would prefer we don’t spec it yet because of that. There needs to be some actual business cases brought to light before we can even attempt to spec that out. I can tell you that from an IEEE perspective, I’ll be interested in studying this aspect because I care a lot about it, but it’s not a concern I’m willing to even put out there are problems we can witness, identify and scope around this issue.

I know it’s messy the way it is. It needs to be until it hurts enough to make it better.

-a-

On Feb 7, 2014, at 9:38 AM, Brian J. Miller notifications@github.com wrote:

In so far as data ownership is a hot topic I don't think it is possible to say that the user "owns" the data. Internal corporate training organizations would likely balk at that statement, hence the debate.

There is also a very real problem of implementation of "delete data" on some arbitrary property. Even if you take an Agent, does it apply to just statements where they were the "actor"? Statements where they are part of a Group in the "actor"? Statements where they are the "actor" or "object"? How about as part of a SubStatement and/or StatementRef? If you delete a statement that uses a StatementRef (outside of the void case) do you delete the corresponding statement? The better option would likely be "authority" but that comes with its own can of worms. There is also the problem of LRS to LRS transfers, getting rid of a statement may not be so simple, that is why the voiding is done using a statement rather than just an HTTP method. And I haven't even gotten into backups, offline backups, and offsite backups. Is anything really ever deleted anymore? This will be very hard to spec "right".

— Reply to this email directly or view it on GitHub.

brianjmiller commented 10 years ago

Except the original concern of the "user" owning their data comes down to identifying the "user", cause that can't be just the "actor". So is it the "authority"? In the case of OAuth there are two agents in the "authority", so dual ownership? Do I, Brian, own "brian.miller@scorm.com" cause that is what most of my statements are sent as, or does Rustici own that "Agent"? Either way we are both :+1: for the end result, I wasn't choosing a side in the debate, I was merely pointing out that the existence of the debate is a huge problem in and of itself.

aaronesilvers commented 10 years ago

/sigh

Points well taken, @brianjmiller. Back in the long long ago when we were using FOAF in reverse to work with identifying all the possible statements that could be "you" this idea was an easier one... yeah, it's definitely :+1: on the debate being a problem and it being one that's going to persist until the problem is more of a problem than the debate.

garemoko commented 10 years ago

Agreeing with @brianjmiller that the spec is the wrong place for these.

On specifying an endpoint for deleting statements, I agree that has an LRS customer it might be a nice feature to be able to delete statements (taking into account the complexities outlined by @brianjmiller). Should there be an API endpoint to allow me to do this though? And do we need to define a standard way of doing this specified in the spec? I don't think so. IMHO it should be left for vendors to decide how to handle this and differentiate their products. Assuming you agree an endpoint is inappropriate, there's no interoperability issue here.

Case studies and examples are always good and short syntax examples in particular absolutely belong in the spec. Longer scenarios I'm not so sure on. certainly helpful but most probably belong outside the spec.

bscSCORM commented 10 years ago

Regarding the deletion portion of this thread -- there is deliberately no delete support because delete can't be guaranteed in a distributed environment. That is, other systems may already have the statements, and delete won't propagate the removal of such statements.

Even though there is no explicit language that statements are indestructible, there doesn't have to be, since there is no language which allows for there deletion, nor any language which allows the LRS to filter statement results based on security. If statements are deemed too sensitive to ever return, an LRS would be allowed to return 403-Forbidden to any request which would return those statements, but just filtering them out clearly violates the spec. Better yet, it's entirely reasonable to require elevated privileges to access voided statements, so the statements could be voided and 403-Forbidden could be returned on requests to those specific voided statements.

bscSCORM commented 10 years ago

I should be a little slower to post in the morning, we do allow for filtering based on user permissions, so I suppose "deleted" statements could exist but not be visible to normal users. They still have to be retrievable by some user, so it's not a true delete. I stand by the suggestion of voiding and then restricting access to voided statements being the best way to handle this.

aaronesilvers commented 10 years ago

+1 @bscSCORM

On Feb 18, 2014, at 9:01 AM, bscSCORM notifications@github.com wrote:

I should be a little slower to post in the morning, we do allow for filtering based on user permissions, so I suppose "deleted" statements could exist but not be visible to normal users. They still have to be retrievable by some user, so it's not a true delete. I stand by the suggestion of voiding and then restricting access to voided statements being the best way to handle this.

— Reply to this email directly or view it on GitHub.

garemoko commented 10 years ago

My understanding for voiding was that this was only to be used for errors. Is this another user case where voiding is appropriate. The spec says "mistakes or other factors" are reasons to void statements. Is this another valid factor?

bscSCORM commented 10 years ago

It could be a true and accurate statement which contains information which the actor or someone else no longer wants published, possibly for privacy reasons.

Since void is as close as we come to delete, and also will inform downstream systems the statement should be hidden, this is a use case for void.

On Mar 26, 2014, at 4:14 PM, Andrew Downes notifications@github.com wrote:

My understanding for voiding was that this was only to be used for errors. Is this another user case where voiding is appropriate. The spec says "mistakes or other factors" are reasons to void statements. Is this another valid factor?

— Reply to this email directly or view it on GitHub.

garemoko commented 9 years ago

This issue (scenario 2) has also been discussed here: https://groups.google.com/a/adlnet.gov/forum/#!topic/xapi-design/ou9YuaPoRH4

berthelemy commented 9 years ago

Hi all,

Re. Is voiding only used for handling errors?

I have a business use where a user needs to be able to remove an item from their CPD record. We are using void for that purpose. There is no need to keep any record of the statement after it's been removed.

Mark

brianjmiller commented 9 years ago

@berthelemy depends on the reason for needing to remove an item, but in general, yes, voiding is for statements that shouldn't have existed to begin with. I think you'll run into issues of trying to use void as the opposite of some other action. If you are a developer (or get developer speak) think in terms of true vs false vs null (or undefined). Voiding is intended to make something null again, rather than make something that was true now false.

Having said that, I think that fits in the category of "best practice" rather than something actually prohibited by the spec, you can void a statement for whatever reason you'd like. During our Recipe work we found various reasons why using voiding as a reversing action wasn't a good option.

fugu13 commented 9 years ago

Hi Mark,

Yes, voiding is only intended for use to remove mistaken data. Statements are immutable records of 'things that happened'. An LRS isn't a place to store 'the current list of items', but a place to store 'person added an item. person marked an item as done. person added a note to an item', et cetera, and from that you can easily construct the current state of a list of items (for instance).

The issue of a build-up of data (say someone's done literally tens of thousands of operation on their list) can be handled by materializing an older state of the data and storing it in one of the Document APIs, then only playing forward data from that point (I realize that's a really fast gloss that probably doesn't make much sense without more context, I'd be happy to talk more about that approach if you'd like).

Now, if the list really doesn't serve as a learning record at all, it probably isn't appropriate to send it as statements, but should be maintained in the document APIs.

Sincerely, Russell

On Fri, Feb 27, 2015 at 4:44 AM, Mark Berthelemy notifications@github.com wrote:

Hi all,

Re. Is voiding only used for handling errors?

I have a business use where a user needs to be able to remove an item from their CPD record. We are using void for that purpose. There is no need to keep any record of the statement after it's been removed.

Mark

— Reply to this email directly or view it on GitHub https://github.com/adlnet/xAPI-Spec/issues/442#issuecomment-76389034.

garemoko commented 9 years ago

This is a great discussion! But it's getting into a specific use case that's different from the original issue. Can I suggest moving it to a new issue or the Google Groups to avoid the original issue becoming unwieldy?

canweriotnow commented 9 years ago

Going back to the original issue: I think this can be addressed by changing one of the most egregious mistakes in the spec (from my perspective):

An Agent MUST NOT include more than one (1) Inverse Functional Identifier;

It's crippling to lack a mechanism to associate Jane Doe's email (mbox:jdoe@example.org) with Jane Doe's Github account ({"account":{"homePage":"https://github.com","name":"happyrabbit"}} so that I can easily report all of her activity without having to separately correlate these identities.

Now, as far as the change goes, if we're using an immutable data store, and we know in advance that jdoe@example.org is now jboggs@example.org and the previous is deprecated, we can transact in the latter value for mbox and the db state prior to the change will reflect the old email while the db state after will reflect the new, but that's an implementation detail beyond the scope of spec'ing an interface so it makes more sense to me to drop that silly requirement so our LRS data can be useful and consistent. Usefulness and consistency are important design goals in my opinion.

But this should probably be its own issue.

adlnet / xAPI-Spec

Real world scenarios #442

-a-