benaclejames / VRCFaceTracking

OSC App to allow VRChat avatars to interact with eye and facial tracking hardware
https://docs.vrcft.io
Apache License 2.0
594 stars 94 forks source link

[RFC] Expose standardized OSC interface for raw tracking data #128

Open zarik5 opened 1 year ago

zarik5 commented 1 year ago

In this RFC I propose a unified interface for submitting raw tracking data, for sources like ALXR, ALVR, VRCFTQuestPro and other projects.

Description

The 2 main face tracking platforms supported today are HTC eye and lip trackers, and the Quest Pro. Another source is the combined eye gaze exposed by the OpenXR extension XR_EXT_eye_gaze_interaction, which, among others, is supported by the Pico platform.

For the protocol I propose OSC ovr UDP.

The addresses of the OSC inputs are grouped by scope. For illustration here I divide each address into prefix and suffix, which should be used together to form a single path.

I propose 4 prefixes: /tracking/eye, /tracking/face_fb, /tracking/eye_htc, /tracking/lip_htc.

/tracking/eye contains unopinionated inputs relative to the eyeballs movements, not constrained to any specific vendor. This prefix can have /center, /left and /right subpaths. An example for a full path is: /tracking/eye/left/pitch_yaw. These are the proposed suffixes:

Suffix OSC args
/active Bool
/pitch_yaw [Float, Float]
/vec [Float, Float, Float]
/quat [Float, Float, Float, Float]

Clients can send any of these inputs at any time and VRCFaceTracking should adapt to read any of these inputs in any order. When a client detects that an eye is no longer actively tracked, it should send False to the corresponding /active input as soon as possible; while it's not mandatory to send /active = True when the corresponding input is available, due to the unreliable nature of UDP, /active = False inputs should be sent periodically. If VRCFaceTracking does not receive any input for a particular eye for more than 10 seconds then it is free to perform any suitable idle animation.

Pitch and yaw are in radians, where 0 pitch and 0 yaw corresponds to the eye looking in the forward direction. The rotations follow the right hand rule, so +pitch is up and +yaw is left. /vec inputs are in order X, Y, and Z of the forward gaze direction. /quat inputs are in order W, X, Y, Z of the forward gaze direction. All orientation conventions should be local relative to the head orientation.

These are the suffixes relative to the /tracking/face_fb prefix:

Suffix OSC args
/brow_lowerer_l Float
/brow_lowerer_r Float
/cheeck_puff_l Float
... ...

All other suffixes are extrapolated from the XR_FB_face_tracking OpenXR extension by transforming constants to snake_case.

At the address /tracking/face_fb (full path), clients can send the full vector of 63 floats in one go.

Suffixes for /tracking/eye_htc are extracted in the same way from XR_HTC_facial_tracking:

Suffix OSC args
/left_blink Float
/left_wide Float
/right_blink Float
... ...

And for /tracking/lip_htc:

Suffix OSC args
/jaw_right Float
/jaw_left Float
/jaw_forward Float
... ...

VRCFaceTracking is responsible to handle any combination of inputs from different sources/clients. In the most common case FB and HTC inputs will not be mixed.

Similarly for fb, clients can send full parameters vectors at /tracking/eye_htc and /tracking/lip_htc.

Rationale

Why OSC over UDP?

OSC is a well established protocol in the VRChat community, with a simple interface. UDP is suitable for low latency transmission, and the integrity of the transmitted data is non-critical.

Why duplicating FB and HTC inputs?

The goal of this interface is to be as little opinionated as possible. These platforms have different conventions and so they should have separate input endpoints. The data should be pushed to the interface with as little preprocessing as possible, since it's VRCFaceTracking job to pack the data in a suitable streaming format for VRChat. The only exception is for FB eye tracking which is sent in a pose form, with global reference frame, which should be converted to local relative to the head orientation.

Possible extension

This interface should support exposing the internal UnifiedExpressions as OSC inputs directly. The UnifiedExpressions set is in the process of being stabilized.

This interface could also be extended for hand tracking input, using path prefixes /tracking/hand/left and /tracking/hand/right. The suffixes can be:

Suffix OSC args
/active Bool
/thumb_curl Float
/index_curl Float
/middle_curl Float
/ring_curl Float
/pinky_curl Float
/splay Float
/thumb_rot_xy [Float, Float]

/splay is average splay of the fingers from index to pinky. /thumb_rot_xy convention is to be defined.


EDIT: Changed Enabled -> Active, refactored eye paths. EDIT2: Added OSC batch inputs for face_fb, eye_htc and lip_htc. EDIT3: Support for UnifiedExpressions inputs. EDIT4: Use snake_case path suffixes for consistency.

zarik5 commented 1 year ago

On the Discord server it was discussed to implement this as a VRCFT module.

regzo2 commented 1 year ago

Summary

To best summarize your goal with this proposal: VRCFaceTracking's tracking interface should be exposed as various vender face tracking data types (htc, fb, etc. derived from the OpenXR API) using OSC as the transmission protocol. And as an added bonus integrate with finger tracking endpoints. I think that's the best way to summarize it, and please correct me if I am wrong😅

We are already pivoting VRCFT to become a more universal face tracking interface and standard that can potentially accept any of these different vender's data types as long as it can transform to VRCFT's tracking interface (our aforementioned Unified Expressions overhaul coming soon, which is already fully included in the repository unreleased). While implementing directly to VRChat is the simplest and most direct way to send data, our goal with VRCFT is to be completely platform independent eventually (and yes I am aware we have VRC in VRCFT but still), and implementing towards our interface will automatically allow you to take advantage of already existing face tracking avatars and any app that can use VRCFT face tracking data.


VRCFT's Module System

As we had mentioned on the Discord, sending any face tracking data to VRCFaceTracking is entirely doable using an external tracking module for VRCFaceTracking (in this case, the module would have these endpoints exposed as described; then transmit to the appropriate endpoint based on the data you have to the relevant vendor/data type, then the module would transform the data to work within VRCFT's tracking interface). If your goal is to abstract out to various vender endpoints then creating an external tracking module addon for VRCFT to handle this sort of data interaction in VRCFT would be the preferred way to implement it.

I think in this context; A better suggestion would be if VRCFaceTracking should have a default 'universal' module that many different devices could subscribe to using OSC as the main standard transmission protocol? I do think that this is something we could definitely consider, especially if it helps developers target VRCFT with their tracking data.


General Points

Finger Tracking

So currently VRCFaceTracking does not currently have any underlying tracking interface for finger tracking. The scope of this project was always to provide a consistent face tracking interface that developers could implement towards to send face tracking data to avatars. However I personally like the idea of VRCFT potentially become more of a general interface for different types of tracking datas in the future, but currently our goal is to standardize face tracking data to a common unified tracking/unified blend shape standard for avatars.

OSC

My main concern with making OSC either the main native interface or a native interface for VRCFT is limiting how the app itself can accept data. I do think OSC is a flexible standard in how it allows you to express endpoints in a very user friendly way, but our goal with VRCFT is to be completely agnostic to any transmission of data, and we achieve this using a module system (and soon, our tracking interface will be fairly unified too so it should in theory accept data from different interfaces more thoroughly as well). Though this point is not really a concern if it's done within a VRCFT module.

To make it very clear: you are able to use any data transmission method you want to send data to VRCFT as long as a module can accept the data and parse the data to be accepted by the VRCFT tracking interface. There isn't any inherent way to send data to VRCFT except by implementing it into an external tracking module. The data will eventually be transmitted to the appropriate endpoints of a specific app (in this case, VRChat via OSC).

If you want to use OSC then it is a perfectly acceptable transmission protocol to use within a module. As long as the data being sent isn't in conflict with other OSC endpoints or with VRCFT transmitting OSC data to VRChat, then it should be perfectly usable!


Conclusion

Thank you for your issue! It brings up a lot of interesting points and I hope I answered or can answer any sort of questions you have about VRCFaceTracking, or about what you think might be a good course of action to provide the level of functionality mentioned.

zarik5 commented 1 year ago

I think that's the best way to summarize it, and please correct me if I am wrong

Yes! Thank you 🙂

I think in this context; A better suggestion would be if VRCFaceTracking should have a default 'universal' module that many different devices could subscribe to using OSC as the main standard transmission protocol?

This is indeed the more general goal of this RFC. OSC + UDP provides a really simple interface for external apps to connect to VRCFT without the development overhead of creating their own module. It also allows apps/clients to become more independent from the main VRCFT project. But depending on the needs of each app, the underlying module system is still useful as an option.

regzo2 commented 1 year ago

OSC + UDP provides a really simple interface for external apps to connect to VRCFT without the development overhead of creating their own module.

That's an interesting proposal! There are module projects like this libalxr / alxr module that is positioning itself as a general OpenXR interface for VRCFT. I'm guessing you're more concerned that there should be a module that just accepts OSC data instead; with all the OpenXR vender API's as OSC endpoints.

It also allows apps/clients to become more independent from the main VRCFT project. But depending on the needs of each app, the underlying module system is still useful as an option.

Most face tracking runtimes already are completely independent of VRCFT, and most apps like Neos implement directly with developer SDK's like SRanipal (Vive's face tracking SDK). It's generally apps like VRChat that don't implement towards any particular face tracking standard/device/SDK/etc. where VRCFT can be especially helpful in standardizing the input already for developers.

VRChat is pretty unconventional in that regard, and my only concern is with how they plan on addressing being standard agnostic with their face tracking implementation.

regzo2 commented 1 year ago

Was reading through your issue again, I am actually really liking the idea of just sending the entire array of data to the general endpoints (one example you had was just yeeting a vector of 63 floats in the order of the fb_face enum to /tracking/fb_face), and that would probably be the preferred way to send data since you won't have to boilerplate a bunch of parameter specific endpoints.

A lot of the existing modules that use sockets already send the data in an array/vector in a single buffer, so moving over to using OSC using that way wouldn't be a huge burden. I am still interested in supporting both though if the goal is to be more agnostic.