Open zarik5 opened 1 year ago
On the Discord server it was discussed to implement this as a VRCFT module.
To best summarize your goal with this proposal: VRCFaceTracking's tracking interface should be exposed as various vender face tracking data types (htc
, fb
, etc. derived from the OpenXR API) using OSC as the transmission protocol. And as an added bonus integrate with finger tracking endpoints. I think that's the best way to summarize it, and please correct me if I am wrong😅
We are already pivoting VRCFT to become a more universal face tracking interface and standard that can potentially accept any of these different vender's data types as long as it can transform to VRCFT's tracking interface (our aforementioned Unified Expressions overhaul coming soon, which is already fully included in the repository unreleased). While implementing directly to VRChat is the simplest and most direct way to send data, our goal with VRCFT is to be completely platform independent eventually (and yes I am aware we have VRC in VRCFT but still), and implementing towards our interface will automatically allow you to take advantage of already existing face tracking avatars and any app that can use VRCFT face tracking data.
As we had mentioned on the Discord, sending any face tracking data to VRCFaceTracking is entirely doable using an external tracking module for VRCFaceTracking (in this case, the module would have these endpoints exposed as described; then transmit to the appropriate endpoint based on the data you have to the relevant vendor/data type, then the module would transform the data to work within VRCFT's tracking interface). If your goal is to abstract out to various vender endpoints then creating an external tracking module addon for VRCFT to handle this sort of data interaction in VRCFT would be the preferred way to implement it.
I think in this context; A better suggestion would be if VRCFaceTracking should have a default 'universal' module that many different devices could subscribe to using OSC as the main standard transmission protocol? I do think that this is something we could definitely consider, especially if it helps developers target VRCFT with their tracking data.
So currently VRCFaceTracking does not currently have any underlying tracking interface for finger tracking. The scope of this project was always to provide a consistent face tracking interface that developers could implement towards to send face tracking data to avatars. However I personally like the idea of VRCFT potentially become more of a general interface for different types of tracking datas in the future, but currently our goal is to standardize face tracking data to a common unified tracking/unified blend shape standard for avatars.
My main concern with making OSC either the main native interface or a native interface for VRCFT is limiting how the app itself can accept data. I do think OSC is a flexible standard in how it allows you to express endpoints in a very user friendly way, but our goal with VRCFT is to be completely agnostic to any transmission of data, and we achieve this using a module system (and soon, our tracking interface will be fairly unified too so it should in theory accept data from different interfaces more thoroughly as well). Though this point is not really a concern if it's done within a VRCFT module.
To make it very clear: you are able to use any data transmission method you want to send data to VRCFT as long as a module can accept the data and parse the data to be accepted by the VRCFT tracking interface. There isn't any inherent way to send data to VRCFT except by implementing it into an external tracking module. The data will eventually be transmitted to the appropriate endpoints of a specific app (in this case, VRChat via OSC).
If you want to use OSC then it is a perfectly acceptable transmission protocol to use within a module. As long as the data being sent isn't in conflict with other OSC endpoints or with VRCFT transmitting OSC data to VRChat, then it should be perfectly usable!
Thank you for your issue! It brings up a lot of interesting points and I hope I answered or can answer any sort of questions you have about VRCFaceTracking, or about what you think might be a good course of action to provide the level of functionality mentioned.
I think that's the best way to summarize it, and please correct me if I am wrong
Yes! Thank you 🙂
I think in this context; A better suggestion would be if VRCFaceTracking should have a default 'universal' module that many different devices could subscribe to using OSC as the main standard transmission protocol?
This is indeed the more general goal of this RFC. OSC + UDP provides a really simple interface for external apps to connect to VRCFT without the development overhead of creating their own module. It also allows apps/clients to become more independent from the main VRCFT project. But depending on the needs of each app, the underlying module system is still useful as an option.
OSC + UDP provides a really simple interface for external apps to connect to VRCFT without the development overhead of creating their own module.
That's an interesting proposal! There are module projects like this libalxr / alxr module that is positioning itself as a general OpenXR interface for VRCFT. I'm guessing you're more concerned that there should be a module that just accepts OSC data instead; with all the OpenXR vender API's as OSC endpoints.
It also allows apps/clients to become more independent from the main VRCFT project. But depending on the needs of each app, the underlying module system is still useful as an option.
Most face tracking runtimes already are completely independent of VRCFT, and most apps like Neos implement directly with developer SDK's like SRanipal (Vive's face tracking SDK). It's generally apps like VRChat that don't implement towards any particular face tracking standard/device/SDK/etc. where VRCFT can be especially helpful in standardizing the input already for developers.
VRChat is pretty unconventional in that regard, and my only concern is with how they plan on addressing being standard agnostic with their face tracking implementation.
Was reading through your issue again, I am actually really liking the idea of just sending the entire array of data to the general endpoints (one example you had was just yeeting a vector of 63 floats in the order of the fb_face
enum to /tracking/fb_face
), and that would probably be the preferred way to send data since you won't have to boilerplate a bunch of parameter specific endpoints.
A lot of the existing modules that use sockets already send the data in an array/vector in a single buffer, so moving over to using OSC using that way wouldn't be a huge burden. I am still interested in supporting both though if the goal is to be more agnostic.
In this RFC I propose a unified interface for submitting raw tracking data, for sources like ALXR, ALVR, VRCFTQuestPro and other projects.
Description
The 2 main face tracking platforms supported today are HTC eye and lip trackers, and the Quest Pro. Another source is the combined eye gaze exposed by the OpenXR extension
XR_EXT_eye_gaze_interaction
, which, among others, is supported by the Pico platform.For the protocol I propose OSC ovr UDP.
The addresses of the OSC inputs are grouped by scope. For illustration here I divide each address into prefix and suffix, which should be used together to form a single path.
I propose 4 prefixes:
/tracking/eye
,/tracking/face_fb
,/tracking/eye_htc
,/tracking/lip_htc
./tracking/eye
contains unopinionated inputs relative to the eyeballs movements, not constrained to any specific vendor. This prefix can have/center
,/left
and/right
subpaths. An example for a full path is:/tracking/eye/left/pitch_yaw
. These are the proposed suffixes:/active
Bool
/pitch_yaw
[Float, Float]
/vec
[Float, Float, Float]
/quat
[Float, Float, Float, Float]
Clients can send any of these inputs at any time and VRCFaceTracking should adapt to read any of these inputs in any order. When a client detects that an eye is no longer actively tracked, it should send
False
to the corresponding/active
input as soon as possible; while it's not mandatory to send/active = True
when the corresponding input is available, due to the unreliable nature of UDP,/active = False
inputs should be sent periodically. If VRCFaceTracking does not receive any input for a particular eye for more than 10 seconds then it is free to perform any suitable idle animation.Pitch and yaw are in radians, where 0 pitch and 0 yaw corresponds to the eye looking in the forward direction. The rotations follow the right hand rule, so +pitch is up and +yaw is left.
/vec
inputs are in order X, Y, and Z of the forward gaze direction./quat
inputs are in order W, X, Y, Z of the forward gaze direction. All orientation conventions should be local relative to the head orientation.These are the suffixes relative to the
/tracking/face_fb
prefix:/brow_lowerer_l
Float
/brow_lowerer_r
Float
/cheeck_puff_l
Float
All other suffixes are extrapolated from the
XR_FB_face_tracking
OpenXR extension by transforming constants to snake_case.At the address
/tracking/face_fb
(full path), clients can send the full vector of 63 floats in one go.Suffixes for
/tracking/eye_htc
are extracted in the same way fromXR_HTC_facial_tracking
:/left_blink
Float
/left_wide
Float
/right_blink
Float
And for
/tracking/lip_htc
:/jaw_right
Float
/jaw_left
Float
/jaw_forward
Float
VRCFaceTracking is responsible to handle any combination of inputs from different sources/clients. In the most common case FB and HTC inputs will not be mixed.
Similarly for fb, clients can send full parameters vectors at
/tracking/eye_htc
and/tracking/lip_htc
.Rationale
OSC is a well established protocol in the VRChat community, with a simple interface. UDP is suitable for low latency transmission, and the integrity of the transmitted data is non-critical.
The goal of this interface is to be as little opinionated as possible. These platforms have different conventions and so they should have separate input endpoints. The data should be pushed to the interface with as little preprocessing as possible, since it's VRCFaceTracking job to pack the data in a suitable streaming format for VRChat. The only exception is for FB eye tracking which is sent in a pose form, with global reference frame, which should be converted to local relative to the head orientation.
Possible extension
This interface should support exposing the internal UnifiedExpressions as OSC inputs directly. The UnifiedExpressions set is in the process of being stabilized.
This interface could also be extended for hand tracking input, using path prefixes
/tracking/hand/left
and/tracking/hand/right
. The suffixes can be:/active
Bool
/thumb_curl
Float
/index_curl
Float
/middle_curl
Float
/ring_curl
Float
/pinky_curl
Float
/splay
Float
/thumb_rot_xy
[Float, Float]
/splay
is average splay of the fingers from index to pinky./thumb_rot_xy
convention is to be defined.EDIT: Changed
Enabled
->Active
, refactored eye paths. EDIT2: Added OSC batch inputs for face_fb, eye_htc and lip_htc. EDIT3: Support for UnifiedExpressions inputs. EDIT4: Use snake_case path suffixes for consistency.