Yellow-Dog-Man / Resonite-Issues

Issue repository for Resonite.
https://resonite.com
124 stars 2 forks source link

Full/partial session recording, editing and playback #1914

Open Zyzyl opened 2 months ago

Zyzyl commented 2 months ago

Is your feature request related to a problem? Please describe.

Currently there is no native method for producing a full or partial recording of a Resonite session which can be played back or edited later. This means that events during sessions are ephemeral and difficult, if not impossible, to precisely recreate or re-experience later.

The ability to make such a recording would be extremely useful in many contexts such as:

This would also enable a new workflow for video recording since a whole performance could be captured and video recorded afterwards from the played-back capture.

Simple recording and playback of audio, visual and haptic elements would be very useful. However, it would be even more powerful if one were able to edit a recording (i.e. having full write access to the recorded data model). The ability to extract specific elements from a recording (e.g. a user avatar's movements) would also be useful, such as for motion animation capture.

Describe the solution you'd like

I would like the ability to produce a capture of a session which would allow me (or someone else) to replay the recording and explore the recorded environment as if I/they were present in the session as it was happening.

Ideally I would also like the option pause, rewind/fast-forward, jump to timestamps, change playback speed etc. as well as to make edits to the recording using standard Resonite tools and Inspector panels. I imagine this would require a full time-dependent data model record.

Describe alternatives you've considered

For audio/visual media, the closest available option is to record video from a session which only captures 2D visual and audio content from a single perspective per-recording output.

For data collection, one must setup the relevant collection methods ahead of time with no ability to return and extract additional values at a later date.

An alternative would be to produce a plugin or mod which provides this functionality (e.g. the MetaGen bot for motion recording).

Additional Context

I expect there would be significant privacy concerns given the nature of the user data captured that this would entail. I expect there would also be asset (mesh, texture etc.) security concerns. The requested editing feature could be misused to create misleading recordings.

Requesters

Zyzyl (zyzylian on Discord) Jason Moore (jasonmo on Discord) Carlos Austin (shellha on Discord)

JasonMo680 commented 2 months ago

This would be a huge win for a ton of communities in Resonite! As an educator, it would be the best tool I've ever imagined. The popular learning/training VR platform Engage has this feature and I've sat in on a few lectures recorded this way, it is awesome. They have a simple editing system as well, allowing you to keep/remove avatars and other assets in the scene, and of course do normal time-based editing/trimming. This tutorial is terrible, but illustrates their version of this tool. https://www.youtube.com/watch?v=aesuF8NsLew

paradoxical-autumn commented 2 months ago

While I agree that this would be a helpful feature for events, there would have to be a way for users to opt out of this.

An idea I came up with: Before starting a session, having the option to mark the session as recorded. This will add an alert in the world browser / in a pop up before joining the session.

Of course, this option would be like "unsafe mode" where it can only be set before a session is started and recording cannot happen in a session without the flag.

However, this feature would come with security concerns such as what if a user could start a recording in a session without having the flag set?

Frooxius commented 2 months ago

This is a bit tricky issue from technical perspective, because there's multiple ways to achieve something like this, with different "side effects".

Most notably one thing we have to deal with is long term compatibility.

The technically easiest way would be just to capture network traffic, but the moment we'd update the client, the recording would just not work anymore.

Another approach would be to instrument common things in the scene and simply capture user inputs and stuff like that and just run other driven things normally - but this wouldn't work with any content with stochastic behavior - if we didn't capture all the "inputs" perfectly, the recording would actually diverge.

For some use-cases, this wouldn't matter as much - e.g. for users recording voice, movements and some basic interactions could be sufficient to cover some of them, but this wouldn't work for everything.

We could also just record everything audio/visual raw and strip any behaviors - we just record the final values on the audio/visual components and just not run any dynamic behaviors to make sure the recording is as accurate as possible - but this would also result in large recording file and would not be usable for other use-cases, like debugging for example.

Some of the additional stuff also increases the complexity even more - like being able to edit / extract parts of the recording - with some approaches this could be trivial (like the raw recording), but with others this could create more complexities - but the extraction part would also be difficult unless you're specifically wanting the final values - but in some cases you wouldn't, like recording mocap - those values would be useless for that.

We generally wouldn't approach this like that, but instead make a more dedicated mocap recording tool that records exactly what's necessary. Even if all the values were recorded, this could be very hard to work with, because you'd likely have dozens of thousands (if not more) tracks to sift through.

Some of the approaches will also make it hard to really build much on top - you'll just be able to view stuff, but for some of the use-cases it sounds like you want just a portion of things recorded, which can then be integrated into another world.

Problem here is you listed a lot of use-cases and I don't think one solution would really fit all, which makes this issue very complicated, because of it's very broad scope and I'm not sure how to even start approaching it right now.

The only way I see this is splitting this into several different tools for different use-cases, rather than trying to do this as a single unified tool.

Zyzyl commented 2 months ago

I only had a vague sense of what this might entail on the engineering side, so thanks for providing some insight into the different options on that end! I can understand that some use cases might be best tackled in different ways - that's fine. When writing up the issue I had a few main use cases in mind while some of the others (debugging for example) seemed like other useful outcomes which could be based on a similar system. If there are certain use cases which could be served much more easily than others, I wouldn't want those to be blocked just because solving the whole problem in one go is too complex.

For me (and people like Jason and Carlos), I think the primary ask would be to capture the audio and visual (plus maybe haptic?) aspects of a session in a format which could be played back. That alone would be a very powerful form of 'experience capture' which would be great for the performance, rehearsal, coaching, teaching and archiving use cases. Other features like editing or the ability to extract only parts of the recording (mocap for example) would be amazing to have, but I think we'd happily forgo them if it meant that core feature were implementable. Please let me know how you'd like me to proceed with that - should I create another more tightly scoped issue or shall I edit this one?

Also, could you expand a bit on the comment regarding stochastic behavior? The main thing which comes to mind is particle effects since I believe those are locally generated by each client. In that case, there isn't really a 'canonical' version of the experience anyway since even users present at recording time would see something different. I think some divergence in recording playback is likely tolerable there. Alternatively, in other contexts, I sometimes run simulations and other computations involving random number generation. In those cases the problem of reproducibility is solved by setting a random number generator seed at the beginning of the process - is something like that a viable option here too?

Frooxius commented 2 months ago

@Zyzyl What might be most useful right now is seeing what are most important use-cases to you and which ones are less important. That way we could better determine focus and scope of this.

For stochastic behavior - it's not particles - those are inconsequential - but say any ProtoFlux or other behaviors that depend on some randomness - external data (e.g. OSC), random number generators, timing of things, simulations that can diverge depending on framerate and so on... Anything that doesn't converge would make the playback play differently.

AmasterAmaster commented 2 months ago

A fascinating thought experiment I thought of while reading through this (while taking randomness into account), is showing parallel universes of a recording. I could imagine a lecture explaining particles and collision and during the lecture a presenter is saying "...if you dont see this value change, then you are probably not lucky enough to see it, but at least here it did..." after the random chance of a particle triggering ProtoFlux or something (hitting in the recording and changing the value for the live viewers at the time, but sometimes not in the rewatching of it). The side effect could even be intentional if that was also a separate world setting, but that is something I was imagining.

Frooxius commented 2 months ago

That would be better handled through a controlled simulation, which you can diverge at desired points in time in a way you want.

With something like this, you don't get that control - you don't control what will diverge, when and how.

hamishgavinmacdougall commented 2 months ago

Agree this feature would be awesome! (I try to save that word for things like this) :) I guess guillefix's metagen tool was a partial attempt at something along these lines and we used that all the time!

Zyzyl commented 2 months ago

@Frooxius thanks, I just had a chat with Jason as to what were the main priorities here.

We agreed that, first and foremost, we would like to be able to record talks, demonstrations, lectures and tutorial content. A couple of examples of content we would have liked to have been able to save in this way are https://www.youtube.com/watch?v=N1IhtMNP1O4 and https://www.youtube.com/watch?v=4O-gZCWfbKw

For this, the key output would be that a user could load up the recording and experience the content as if they had been present in the session. The main focus would be in preserving the audio and visual experience (avatar and prop visuals, voice, multimedia etc). However, we agreed that it would also be important for users to be able to move around in the recorded space as they would in a normal session (i.e. people would need to be able to walk around while playback happens around them). At least in the past, these experiences did not rely much on scripting which relied on randomness nor did they involve bringing outside values into the platform (e.g. no use of websockets or OSC). I think that even if there were a subset of known things which did not work with the recording tech, we could probably find ways to handle that. Important playback controls would be: pause/resume, rewind/fast-forward, jump to timestamp, change playback speed.

I believe that a solution to that effect would also be very useful for some of the other use cases I mentioned previously (e.g. rehearsal, MetaMovie or other performance recording).

I'll note that this may not fit all of @hamishgavinmacdougall 's priorities and use cases though. If so, maybe you could expand a bit more Hamish?

hamishgavinmacdougall commented 2 months ago

Hi Zizyl, Yea, my fascination with the feature probable stems from the old VR application Mindshow that we used before we had access to Frooxengine based platforms: https://youtu.be/5CnSitfIMzA

This program allowed people to select between various avatars, sets (locations) and props, act out a part, return the timeline to 0 and overlay another part to build up ‘skits’. You could then switch to being the camera operator and move around recording your skit to video (a super easy way to make educational material). This had a lot of limitations re. Asset import and 3D immersive content production (rather than just video) but the mechanics were great.

Guillefix’s amazing metagen recorder was nice because it captured everything about an avatar (more than most mocap) including body, eyes, face, voice etc. as a 3D asset for immersive replay but it was trickier to include props. i.e. if you wanted to record a chemistry tutorial with a lecturer manipulating things on a bench or a couple of surgeons doing virtual surgical planning then it quickly became difficult to include props, button state changes, etc.

I asked if it might be possible to record everything happening in a world by capturing all the streaming data that presumably gets sent to each user and replaying it somehow, but it sounded like that was not trivial to say the least!

To date we have been using the metagen tool to capture complete avatars and one or two props. For example, most of the animations in this patient edu attempt are metagen recordings including the moving cells, fire truck, etc. where I just scaled up, grabbed the object (cell, truck, …) and puppeteered the motion while recording: https://youtu.be/00BNsCyGwDY

The dream would be something more like what I think you are describing where you can move around like a ghost in a replaying world containing ‘people’ sounds and events that happened previously – a bit like time travel :) Thanks, Hamish