gocodebox / lifterlms

LifterLMS, a WordPress LMS Solution: Easily create, sell, and protect engaging online courses.
https://lifterlms.com
GNU General Public License v3.0
178 stars 135 forks source link

Tracking Architecture Notes #636

Closed actual-saurabh closed 2 months ago

actual-saurabh commented 6 years ago

The tracking architecture will be refactored and merged into core in phases.

Phase 1: Documentation

The architecture will be completely documented, technically and otherwise. The build process will follow the documentation. It will also run concurrently in the sense that as soon as one aspect is completely documented, the development of that particular part can start without waiting for the rest of the documentation.

Phase 2: Build/ Development

Whatever documentation is completed will be developed as usual.

Phase 3: Test

Everything that has been built will be tested (unit testing and beta user testing) to confirm conformance with the documentation and bugs.

Phase 4: Deploy

The tracking mechanism will be merged in core and run alongside existing tracking mechanism to allow for wider user testing and feedback. (similar to how Google/ Github etc beta-test its new features, see Youtube Studio Beta for an example)

Phase 5: Replace

Once the acceptance reaches a pre-defined threshold, the older mechanism will be retired (with an announced official date) and all users will be switched over to the new mechanism.

actual-saurabh commented 6 years ago

Aims/ Objectives

  1. Track progress of any kind of unit. This includes
    1. existing LifterLMS units (courses, sections, lessons, quizzes, questions, assignments, tasks),
    2. other WordPress post types (posts, pages, custom) and
    3. external resources (via Webhooks & other APIs).
    4. Track progress of individual components (embeds like forms, videos, audio, etc; sections/ blocks, etc;) that make up a unit and factor that into the overall unit completion.
  2. Bubble Tracking upwards and outwards into unit hierarchy.
  3. Track both logged in users and casual visitors to allow for completely free & open units (or full courses).
  4. Track users across devices (Cross-Device Tracking) and include device & environment related variables.
  5. Track contextual information along with units to allow for
    1. unit and component reuse,
    2. repeat attempts,
    3. persona (instructor, admin, student, etc) specific attempts.
  6. Establish tracking standards for easy potential import/export & integration with third-party systems (Analytics platforms, Social Media platforms, marketing platforms, etc).

Compliance & Integration

  1. Tracking records should comply with xAPI standards.
  2. Learning related tracking records should comply with CMI5 Specifications
  3. This automatically makes sure that tracking is compatible with Activity Streams specifications and hence guarantees easy integration with other applications like BuddyPress, bbPress, Facebook, Twitter, etc in a clear logical way.

Other Benefits

  1. Granular tracking will allow really advanced and detailed reporting within LifterLMS
  2. xAPI compliance will allow integration with 3rd party LRSes that will allow customers flexibility to break out of any limitations that LifterLMS's reporting mechanisms may have.
actual-saurabh commented 6 years ago

Experience

An experience is defined as the combination of

  1. An Actor: visitor, student, instructor, etc
  2. A Verb: that describes the kind of experience that the actor went through
  3. An Object: the thing that the actor experienced; usually the learning units, their components or the course.
  4. Context and Metadata: any other information about the experience (like the degree of experience, score, other environment variables, etc)

Sessions

A session is self-explanatory but for the sake of clarity:

  1. Starts when a visitor opens or navigates to a LifterLMS powered WordPress website.
  2. Ends when the visitor navigates away from or closes a LifterLMS powered WordPress website.

What is the use of tracking sessions?

A session id is a unique identifier for a session and will be added to each tracking (experience) record. This way, the overall experience of an actor can be grouped by sessions. This can be useful to record breaks and abandonments as well as track usage patterns for fatigue and appropriate unit lengths.

For example, if most learners only finish 50% of a lesson in one session, it would indicate that the course creator needs to split the lesson into two parts.

Another example is for sectioning. If a majority of users finish 80% of a section in one session, it might make more sense to reorganise sections so that they can finish the whole section in one session.

You can get information like average session length across a variety of parameters (courses, devices, timezones, etc) to help design more effective material. In addition, when (not if) LifterLMS has more tools for flexible personalisation, this data can be really useful to build personalised learning and marketing experiences.

By tracking the time between the end of one session and the start of another, you can get average break lengths possibly indicating fatigue or pinpoint causes of delay (technical glitches, vacations, etc) and other clues to improve the effectiveness of experiences.

This can be extended by adding a maximum break time which can be used to trigger mechanism to handle potential abandonments (email engagements, surveys, personal contact by instructor or staff, etc).

Another way this can be extended is by adding a maximum session duration to force students to take breaks to avoid fatigue and maximise effectiveness.

Also think of the possibilities this opens up wrt to live coaching and mentoring.

There are many more possibilities as long as we track sessions and breaks.

actual-saurabh commented 6 years ago

Actor

Depending on all the reported situations (in feature & support requests), the actor, for the purpose of tracking can be

  1. an unregistered visitor.
  2. a register user who's not logged in, yet.
  3. a logged in user who's not enrolled into the experience.
  4. a logged in and enrolled user.

So, the tracking mechanism will need to implement ways to identify all the types of actors above over a long period of time.

This means that a mechanism other than WordPress's login needs to be used for tracking. A common mechanism is to use a combination of

  1. Client ID (generated on server and stored in a cookie with a long expiry – 365 days or more)
  2. Device Fingerprint (generated on the device) using the Client.js library
  3. Unique Email tag (a random alphanumeric string) sent to the user as long as their email address becomes available for cross-device identification.
  4. User UUID generated using WP's APIs to identify users even without registration into WP's user system. This would be useful in transitioning the user into a WP User and for identifying such a user even when they're not logged in.

In addition, other device details can be added to the tracking for better granular reporting.

actual-saurabh commented 6 years ago

Core Verbs

CMI5 Spec recognizes the following verbs. Using them will get LifterLMS a step closer to CMI5 compatibility or even compliance.

  1. Launched
  2. Initialized
  3. Completed
  4. Passed
  5. Failed
  6. Abandoned
  7. Waived
  8. Terminated
  9. Satisfied

CMI5 is very specific and limited in its scope to the actual learning by a student. LifterLMS however implements a much broader range of functionality that the tracking must be able to express and record.

That can be done with additional verbs from the xAPI Specs that can describe the whole range of experience of all kinds of users in a LifterLMS powered WordPress website. Also, it should be possible to filter the list of verbs that LifterLMS uses to add any verb from the xAPI registry.

Here are some relevant and potentially useful verbs defined by ADL (a US government program for R&D and policy on distributed learning).

Additional Verbs to describe Student activities wrt Learning Progress:

  1. Registered (would come before Launched above, similar to the current understanding of Enrolled in LifterLMS)
  2. Suspended
  3. Resumed (along with Suspended tracks breaks
  4. Progressed (to track amount of progress between initialized and completed, probably everytime the Suspended action occurs)

Competency Verbs

  1. Scored (between 0 & 1 or 0 & 100)
  2. Mastered (to describe competency level achieved within a pre-defined list of levels)

QnA/ Interaction verbs

  1. Asked (to indicate a question)
  2. Answered (to indicate an answer)
  3. Responded (to indicate a response to a question, ie Asked statement, that may not be an answer or to add responses to the answer )
  4. Commented

Generic content (blog posts, etc) tracking verbs

  1. Attempted (to indicate the start of an experience. If not followed by another activity indicating completion, this indicates that the activity remained incomplete)
  2. Experienced (generic verb for videos, audio, reading material, etc)
  3. Interacted (generic verb to indicate that the user manipulated an object in some form, instead of passively experiencing it)

Live Coaching/ Event tracking

  1. Attended

Special Verbs

  1. Voided: Used to declare that an activity statement is to be voided from record (without actually deleting it). Useful for excluding things from reports without losing the tracking data. Once tracked, an activity should only be voided, never deleted.
actual-saurabh commented 6 years ago

Additional Verbs

While the verbs above cover the student's experience of a course and such, it doesn't describe many other activities that should be tracked for more holistic reporting with respect to Memberships and Sales. In addition, it might make sense to track social interactions (in Social Learning Addon, BuddyPress, bbPress or Facebook, Twitter, etc).

A lot of verbs defined in the Activity Stream specifications cover almost all such scenarios. The ones present in the xAPI registry are listed here. This list is partially filtered.:

Pre-registration (enrollment) verbs

  1. Qualified can be used alongside pre-tests

Student-Supervisor (Instructor/Manager/Admin) Interactions

  1. Requested
  2. Retracted
  3. Acknowledged
  4. Accepted
  5. Rejected
  6. Submitted
  7. Satisfied
  8. Unsatisfied

Verbs for Instructors/managers/administrators

  1. Approved
  2. Assigned
  3. Authorized
  4. Denied

Membership (Group) Verbs

  1. Invited
  2. Ignored
  3. Joined
  4. Removed
  5. Left

Specific Content consumption

  1. Consumed
  2. Listened
  3. Watched
  4. Read
  5. Used

Content interaction verbs

  1. Agreed
  2. Disagreed
  3. Liked
  4. Unliked (different from disliked, indicates an undo like activity on like)
  5. Disliked
  6. Favorited
  7. Unfavorited
  8. Shared
  9. Unshared
  10. Flagged as inappropriate

Help & Support verbs

  1. Opened
  2. Confirmed
  3. Closed
  4. Resolved

Competition verbs

  1. Played
  2. Lost
  3. Tied
  4. Won

E-Commerce/Sales

  1. Sold
  2. Purchased
  3. Sponsored
  4. Sent
  5. Delivered
  6. Received
  7. Returned

Event/ Live Coaching (Attended is already added via ADL's list)

  1. Scheduled
  2. Hosted
  3. Presented
  4. Cancelled

Content Workflow (should this be tracked, at all)

  1. Created
  2. Authored
  3. Appended
  4. Attached
  5. Replaced
  6. Saved
  7. Unsaved
  8. Updated
  9. Archived
  10. Deleted

Physical Tracking

  1. Checked in
  2. Was at

Social Interactions (Group/Individual)

  1. Requested friend
  2. Made friend
  3. Removed friend
  4. Followed
  5. Stopped following
  6. Tagged
actual-saurabh commented 6 years ago

Objects

Object Types

Objects currently are post types (course and course elements). However, it makes sense to record an object_type to extend tracking to comments, taxonomy terms, users or other custom objects. For the sake of xAPI, this can be placed in the context object, but could be put in its own column for LifterLMS's use.

Object ID

The identifier can be a numeric ID but can also be an alphanumeric key (in case of options or metadata). This way it could also be extended to Gutenberg blocks in the future.

In terms of xAPI, an object is one of the defined activity types, other actors or even previous statements (in case of voiding). The ADL defined core activities are:

  1. Assessment
  2. Course
  3. Interaction
  4. Link
  5. Media
  6. Meeting
  7. Module
  8. Objective
  9. Performance
  10. Question
  11. Simulation

Activity Streams define additional activities:

  1. Alert
  2. Application
  3. Article
  4. Audio
  5. Badge
  6. Bookmark
  7. Comment
  8. Device
  9. Event
  10. File
  11. Game
  12. Group
  13. Image
  14. Issue
  15. Job
  16. Note
  17. Offer
  18. Organization
  19. Page
  20. Place
  21. Process
  22. Product
  23. Question
  24. Review
  25. Service
  26. Task
  27. Video
actual-saurabh commented 6 years ago

Progress & Completion

  1. Progress of a unit can be the sum of its components' completion (Course -> Lessons, Lessons -> Contents), etc.
  2. Progress is also a measure of partial completion of each unit or its component.

Tracking progress of items with clear playback or progress abstractions is easy. For eg, slideshows, videos, audio. Tracking progress of static components is more complicated. One thought process is to measure the scrolling and the screen visibility of static components. Even if doesn't indicate that the student has actually read or implemented the things that the component represents, it does indicate that the student has at least viewed them. Unless we get into pointless accuracy which can only be achieved through eyeball tracking, we could divide the actual content into slides with an explicit next action to mark completion.

Although, it is highly unlikely that we would implement something like this within core LifterLMS, there has to be an exposed API to allow for customised progress tracking and completion reporting. (More likely would be an addon that allows breaking down lessons into slides and even syncing them with a narrative video or audio.)

Not just at the level of a component, but also at the lesson (or unit) level, there would an API to mark completion via simple hooks and even webhooks so that an external application could trigger progress/completion. For example, sending a pull request in GitHub could trigger a webhook that marks a lesson on how to send PRs complete. In a course teaching social media, posting a particular tweet (or another social media post) can mark a lesson complete. Submitting a form on a third party website may mark a lesson complete. Scanning a bar code or QR Code on a physical object (like a book) could mark a lesson complete.

As of now, the focus is to keep the progress tracking and reporting as open and flexible as possible so that these and more can be built later.

Partial Completion (<100% progress) & Local Storage

This is a bit complicated, especially because it is often difficult to hook into the exact point when the student leaves the unit from the browser. The only reliable way to do this is to keep sending progress on a frequency (that obviously would be open to overrides) but that would create a lot of requests to the server, something that bothers a lot of hosting companies who penalise users with extra charges or suspensions.

This is why we'd have to explore some sort of offline tracking that is synced with the server less frequently. HTML5 local storage can help a lot with this. However, since HTML5 localstorage only stores strings, we either store and retrieve json or use a library that interfaces with localstorage to directly allow working with arrays and objects (https://www.sitepoint.com/9-javascript-libraries-working-with-local-storage/).

Once we build and integrate a local storage interface for LifterLMS, it's use can be extended to other functionality (like Course/Quiz Builder, Quizzes, Assignments, etc).

A possible problem with this is that the last stored progress may not reach the server immediately leading to a delay in reports but this is a known pattern. The moment user opens any other screen, all the existing local storage can be synced and cleared. In addition, any explicit user action that can indicate progress (Mark Complete, Next buttons) can also sync localstorage before proceeding with the action.

If local storage is used, we can track second by second progress locally and then sync it every few minutes or only when the user navigates to a different screen (or component) reducing the server requests phenomenally and this would ensure zero trouble with hosting providers.

actual-saurabh commented 6 years ago

Launched vs Initialized (0% Progress)

For example, opening a lesson is the same as launched but not exactly the same as 0% progress.

Completed vs Terminated (100% Progress)

Finishing the lesson (100% Progress) and explicitly marking it complete (Terminated) or clicking on next lesson are two different concepts.

actual-saurabh commented 5 years ago

Actor

<actor (learner)> <verb> <object>, with <result>, in <context>

An identity (identifier) of an individual or group tracked using Statements as doing an action (Verb) within an Activity.

xAPI Personas vs LifterLMS Personas

The thing to note here is that the xAPI spec uses the word persona, which is different from personas that we've identified in #611. The persona in LifterLMS belongs to the context object of an activity more than the Actor object itself.

Actor = Individual or Group

Another thing to note is that there is an understanding of Groups that matches the idea of User Tags and Segments as described in #604.

So, the tracking architecture instead of being tied down to the concept of a WP User would work with the concept of an Actor with an additional parameter to identify the type of actor (individual or group). Initially, this parameter would probably always refer to an individual but having it means that we can have the flexibility of tracking group activities in the future.

So, it would be useful to create an additional LifterLMS specific Actor table that contains all the types of Actors that we're hoping to track, i.e.

This would allow us to have a single identifier column in the actual tracking table which can then be used to identify the type of actor (individual or group) and other properties of the actor from other tables that describe the individual or the groups in more details.

llms_actor vs wp_user

Additionally, since we're looking to track unregistered visitors/users as well. It'd make sense to create an additional abstraction of a LifterLMS user that sits on top of WP's abstraction of a registered user. In practice, this would mean that all LifterLMS functionality (blocks, shortcodes, widgets, etc) would work perfectly only with this LifterLMS specific actor_id with WP's registered user_id as a secondary identifier along with group_id. The group_id can have secondary identifiers like access_plan_id, course_id, membership_id (or any other form of post_id or term_id), etc.

I did explore the user_status column in wp-users in the hope of using that to track unregistered and anonymous users. This would save the additional layer of abstraction. However, it's been widely identified as a dead column that can be dropped from core any time. Additionally, for completely anonymous users, it'd become a problem to create dummy email Ids to confirm with WP_User's structure.

Internally, the core functionality should always require the actor_id with any secondary identifiers mapped back to the actor_id instead of working directly as an identifier. This would help identify areas for refactoring. However, there would be an interim period where secondary identifiers would also work alongside actor_id, till all identified functionality is refactored and tested.

This would mean maintaining 2 branches of the codebase, one that completely breaks the existing user_id based functionality for gradual refactoring and another that allows the refactored functionality to work alongside legacy implementation.

Groups and Members

Next thing to consider is that the concept of Groups in xAPI identifies two types of groups: Anonymous vs Identified. Contrary to what the names suggest, in both the groups, a string name is optional. The difference is that

A question that arises is when a member completes an activity in a group, should there be a separate record for the individual as well (with the group activity in the context or as an object)? Or would the tracking and reporting join tables and collate activities of an individual both in a group as well as as an individual? This is not urgent and will need to be tackled when we actually start implementing group activities.

actual-saurabh commented 5 years ago

UUID

https://en.wikipedia.org/wiki/Universally_unique_identifier

A couple of problems with offline storage and syncing of data is the identifier in WordPress which is solely generated when data goes into a database table.

Whenever we create new objects in the browser that need to reference another object that has also be recently created offline, we need the object id. This is seen, for example when you create a new section in the course builder and add lessons under it. Unless the section is saved, the lessons can't be saved reliably.

This can be solved by relying on UUIDs since they are more or less guaranteed to be unique across the internet (and not just a particular site) and can be generated both on the server and the browser.

If LifterLMS starts relying on UUIDs as the primary identifier and secondarily, on the db generated object_id, all these cross-referencing can be stored offline and synced as needed extremely reliably.

Which is why, all tracking statements, actors, post & term objects, etc should be ideally identified by UUIDs as far as possible. WordPress has the concept of GUID which is pretty close, so maybe that could be used instead, especially in cases where UUID is too much work or creates unnecessary layer of abstraction. https://deliciousbrains.com/wordpress-post-guids-sometimes-update/, https://developer.wordpress.org/reference/functions/get_the_guid/, https://developer.wordpress.org/reference/functions/the_guid/

This has been discussed in WordPress itself as a simpler alternative to the weird things WordPress does when auto-saving new posts. See: https://make.wordpress.org/core/2010/01/24/taking-advantage-of-uuids/ and https://bjornjohansen.no/uuid-as-wordpress-guid/

I'm leaning towards UUIDs because the xAPI specs also use UUIDs as identifiers and this would just get us closer to standard compliance.

actual-saurabh commented 5 years ago

Based on the notes till now, anonymous cross-device tracking can be clubbed with #604 instead of developing as the first thing. This will ensure proper integration with refactoring the access logic and user-lms relationships.

At this stage, the only thing that we should focus on implementing a flexible, robust and clear core tracking API using WP's user system. However, all development choices would be based on the fact that the next logical step is to create the actor abstraction at the highest level that includes, but is not limited to WordPress's understanding of a user. So, in this stage we'll only refactor the user abstraction as a lower level of the actor abstraction. We'll probably partially implement it but will definitely not implement any other lower level abstractions like anonymous_user or group.

Now, instead of directly moving to the actor abstraction, we'll instead work on feature implementations that can utilise the new tracking API, related to activity progress & reporting.

We'll first redo the reporting to prepare for really cool stuff that will then be easier to integrate with reporting like progress & completion related features, advance interaction tracking, student report dashboards (at course levels), LRS integration for extremely advanced reporting lxHive, Learning Locker, etc. The latter would probably come from 3rd party addons.

This would also include thought on using something akin to D3.js with possible research and inputs into established UX best practices on data visualisation. The actual visualisations can be implemented as an addon (over simple visualisations) at a later stage.

Based on this, we'll implement new features related to progress and completion like multimedia tracking, form tracking, completion webhook and API, etc either as core features or premium addons, on a case by case basis.

Once these are launched, we'll start working on the actor object as the first step of the overall access and group refactor, after which we start working on anonymous tracking.

actual-saurabh commented 5 years ago

Announcements (Notifications, Engagements)

If tracking is the recording of activities, notifications and engagements are fundamentally announcements of activities. One of the intended side effects of this stage has to be defining activities as well if not in a concrete form in code, at least at the level of understanding, so that when an activity is announced, all the available variables can be predicted reliably.

Also, instead of working as two separate features, since the data structure is the same (the activity), either tracking can be announced or announcements can be tracked. So, an activity only needs to announce itself to be tracked or only get tracked to get announced, instead of getting announced and tracked asynchronously.

Since everything that is announced has to be tracked but everything that gets tracked doesn't have to be announced, it makes sense to implement announcements (notifications and engagement triggers) as things that the tracking mechanism implements, not as something that the activity is responsible for.

Reporting

What about instances where there needs to be an announcement but no tracking? That is absurd and I think is confused with instances where the tracking should not be visible in reports. So, not all tracking has to be displayed in reports, but everything displayed in reports has to have been tracked.

So, tracked activities can have a system level privacy which is even higher than an admin level privacy. So, privacy has to be factored into tracking development.

So, tracking will implement reporting and announcements as asynchronous independent functions, with no direct relationship. All relationships will only be established indirectly from tracking.

actual-saurabh commented 5 years ago

Schemas

Apart from data architecture in databases, this is also the right time to implement schemas because of their need in contexts. This will allow for implementation of #456 because then the data structure can be implemented as language specific variations. See: #603

Also, schemas will allow for translation of data structures to other systems or models (like xAPI) and work as a guide or even interface for addons, future development and more importantly, developer documentation.

This doesn't mean that schema will be created for all data structures. We'll only implement it for objects that are being refactored or reimplemented currently in a way that can be extended for all other objects in future.

actual-saurabh commented 5 years ago

REST API

Another intended side effect is to incorporate an understanding (or a partial implementation) of an integration with WP's REST API so that tracking data can be queried and managed in a more app-like environment so that we can use better frontend frameworks and integrate with Gutenberg and customiser.

This is different from an exposed API for distribution which I feel can be kept for future while maintaining an eye on the exposure implementations of WordPress core as it starts moving towards a headless CMS. LifterLMS should also align itself with the future possibility of becoming a headless LMS. This has obvious long term benefits of device specific native app readiness and enterprise-readiness.

thomasplevy commented 5 years ago

Working on speccing out a db schema and a basic JS framework for some event tracking in preparation for some upcoming projects.

JS: https://gist.github.com/thomasplevy/8dfff9bb60b548f5b09142e1001ea351

thomasplevy commented 5 years ago

WIP at https://github.com/gocodebox/lifterlms/tree/events-tracking

Notes on database structure:


|event_type|event_action|description                                  |meta data   |add-on                                 |
----------------------------------------------------------------------------------------------------------------------------
|account   |signin      |A registered user signs into their account   |n/a         |core                                   |
|account   |signout     |A registered user signs out of their account |n/a         |core                                   |
|session   |start       |A visitor starts a new browsing session      |n/a         |core                                   |
|session   |end         |A visitor's browsing session ends            |n/a         |core                                   |
|page      |loaded      |A page on the site is loaded                 |page url    |core                                   |
|page      |exited      |A page on the site is exited                 |page url    |core                                   |
|page      |blurred     |Tab/browser window is not active             |page url    |advanced                               |
|page      |focused     |Inactive tab/browser window is reactivated   |page url    |advanced                               |
|course    |registered  |User enrolls in course                       |n/a         |core                                   |
|course    |completed   |User completes a course                      |n/a         |core                                   |
|course    |launched    |User starts a course                         |n/a         |core                                   |
|course    |progressed  |User completes a portion of a course         |percentage  |core                                   |
|course    |passed      |User passes a course                         |grade       |core                                   |
|course    |failed      |User fails a course                          |grade       |core                                   |
|video     |started     |Video playback started (for first time)      |external id |core:youtube,vimeo add-on,wistia add-on|
|video     |ended       |Video playback ended                         |external id |core:youtube,vimeo add-on,wistia add-on|
|video     |played      |Video "play" button pushed                   |external id |core:youtube,vimeo add-on,wistia add-on|
|video     |paused      |Video "pause" button pushed                  |external id |core:youtube,vimeo add-on,wistia add-on|