Rework engine to a process based architecture

Frooxius commented 6 months ago

Is your feature request related to a problem? Please describe.

At the moment, most of the worlds, UI, rendering and everything runs within a single process.

While this keeps things simpler, it presents major issues:

A single world crashing or freezing takes down the entire process with it, requiring user to restart the whole app and lose app
Resources are shared across all open worlds. If a world is using a lot of RAM or CPU, it is difficult to stop it without losing other worlds and restarting the whole app
A security exploit in a single world can potentially gain access to the entire app - including more sensitive data in the userspace

Describe the solution you'd like

Our goal is to switch to a process based architecture similar to the one used in web browsers like Chrome or Firefox.

Instead of a single process, there will be one main "broker" process, which has privileged access to system resources.

The actual simulation (and possibly rendering) will be handled through additional processes. Opening a new world/session will spin up a separate process, that will be responsible for simulating and calculating that world and communicating with the broker process through an IPC system.

These processes can also be sandboxed using OS primitives, which will restrict how much can be accessed in case of an exploit in a world.

Similarly if particular world freezes/crashes, only the particular world will stop moving/updating for the user. However their userspace (e.g. dash) will remain operational, allowing them to close the world and continue using Resonite mostly uninterrupted.

Describe alternatives you've considered

We can try sandboxing and isolating things more within a single process, but this becomes very quickly unmanageable and impossible in a lot of cases.

Additional Context

It is uncertain whether this should be implemented before or after switch to a custom rendering engine.

Doing so before, would require the Unity integration to be reworked to support this. However it would allow us to use .NET 7+ for world simulation before we switch away from Unity, getting significant performance benefits from more modern runtime and JIT, at the cost of some additional rework.

However doing so might be beneficial and easier anyways, otherwise we'll have to implement this while at the same time switching graphical engine, which significantly increases complexity and would us allow to design new graphical engine integration around the new IPC process.

This is also mostly independent of other performance related updates, such us the data model rework.

Additionally, this might present challenges when we want to add support for platforms like Android, iOS or consoles.

gameboycjp commented 6 months ago

I think I have concerns with timings if rendering of both the dash and the world requires being synchronized, as I assume them being on separate threads would implicitly cause them not to be, and that them being synced would rely on both remaining responsive. Is rendering them separately at all in the cards? That could possibly allow for more complex behaviors like portals peeking into worlds that are not focused, in addition to localspace not screeching to a halt if the focused world does.

Frooxius commented 6 months ago

That's separate from this issue. Having this architecture would not help or make it more difficult to do these things.

5H4D0W-X commented 5 months ago

This is a very specific and not-at-all necessary request, but when this is implemented could we get interfaces for when a world has stopped responding? It would be useful to have a customizable dialog box appear with some info (like what error might have caused the world to stop) similarly to how Windows does it

Frooxius commented 5 months ago

I don't know how customizable we'll make this. There will be a mechanism for this, but also this is one of those things that you want to keep stable and secure, so the customization options might be limited/non-existent for this specifically.

Gotrethen commented 3 months ago

I like this idea. I think it will make Resonite a lot more flexible and efficient.

CalebOWolf commented 3 months ago

Im very much curious on this because with the recent advances in gpus and cpus. In my knowledge. Resonite uses just the cpu and not multiple cores or threads. Or even the gpu for that matter whenever majority of vr games heck other platforms even tend to use both just as evenly. Getting to be my personal matters on this. Im running a stable build pf cpu and gpu (psu not so much..but I'll work on that myself and others input on it as well) But still though even though with a ryzen 5 2600 6c/12t processor and as said on the discord an rx 7600 xt w/16gb vram respectively. Why hasn't this been put to its usage not just speaking for my own usage but for others whom i presume are running modern hardware. This feels like an inefficient way to use a processor or the gpu even. So this would be a much appreciated upgrade and will make resonite better. But if im wrong or right id love feedback on what ive said so far!

Gotrethen commented 3 months ago

I'm curious too about Resonite switching to a process-based architecture, and I agree about the concerns about the game's current CPU utilization. However, the decision to move to a new architecture involves various factors beyond just hardware capabilities. Let's go through some points to consider:

Current State of Resonite:

While it's true that Resonite primarily uses the CPU, relying on a single thread, it's crucial to be aware of the current game's design and technical limitations. There could be reasons behind this choice, such as:
- Engine limitations: The game engine is not properly optimized for multithreading or GPU utilization.
- Gameplay design: The core gameplay mechanics would benefit significantly from parallel processing.
- Development constraints: Time and resource limitations during development had previously hindered multithreaded implementation.

Challenges of Switching Architectures:

Technical complexity: Changing a game's core architecture is a major undertaking, requiring significant development effort.
Testing and compatibility: Extensive testing is needed to ensure the new architecture functions correctly across various hardware configurations.
Cost-benefit analysis: The development cost and time investment will bring on performance benefits and wider user impact.

Gotrethen commented 3 months ago

Once we're aware of the problem space, we can start working toward a more process-driven implementation.

Frooxius commented 3 months ago

It is not actually true that Resonite doesn't use multi-threading - a good number of things in Resonite are multi-threaded as of right now (e.g. asset load & updating, both static and procedural (this includes stuff like text & UI rendering), most of dynamic bone calculations, rendering, audio). There are more things that could be multi-threaded and will be in the future, but it's important to realize that multi-threading isn't a binary on/off switch.

Similarly it's not correct at all that Resonite would not use GPU - the GPU is used for rendering the graphics, this is not done on the CPU - CPU only issues the rendering commands to the GPU. The GPU usage will vary depending on the complexity of the scene and whether the CPU ends up being bottle-necked (which can happen for a number of reasons). In scenes with complex lighting and geometry, the GPU will end up getting utilized more heavily.

However both these issues are mostly tangential to this particular issue, so we shouldn't go too deep into them here. The goal of this rework won't introduce additional multi-threading (although it will allow us to use runtime that handles it better) and won't particularly change how GPU is utilized, so it is better to leave these issues out of this particular issue.

The mechanism through which this will introduce performance benefits is mainly through significantly better JIT compiler, which produces more efficient machine code from existing code (without any changes on our end) and significantly better Garbage Collector (GC), because the current one eats a chunk of CPU cycles.

Additional optimizations will be done on top of this in the future, but those are tackled through separate issues, such as these: https://github.com/Yellow-Dog-Man/Resonite-Issues/issues/705 https://github.com/Yellow-Dog-Man/Resonite-Issues/issues/702

Gotrethen commented 3 months ago

Here is my resummary/input/prioritization on the matter:

Implement a new runtime environment that leverages a significantly better JIT compiler and garbage collector to improve application performance without modifying existing code or significantly altering multi-threading or GPU utilization.
Introduce a new runtime system optimized for efficient code execution and memory management to alleviate performance bottlenecks caused by the current JIT compiler and garbage collector.
Prioritize improving application performance by utilizing a new runtime environment equipped with a superior JIT compiler and garbage collector, focusing on code efficiency and memory management optimization without major changes to multi-threading or GPU usage.

Essentially, the existing infrastructure is reusable. The difference lies in how that infrastructure is utilized, and that's where the JIT and garage collection comes in.

Frooxius commented 3 months ago

I don't understand why these three are separate points. They are essentially the same thing, just written differently.

Better JIT and GC will be implicitly utilized by implementing this process based architecture, there are no additional tasks we need to take care there.

We only need to prioritize this issue for that to happen.

Gotrethen commented 3 months ago

Yes, they are seemingly three slightly varied viewpoints, but they have more similarities than differences I guess. And yeah, I agree that the JIT and GC are probably the priorities here for the process-based architectural shift.

Frooxius commented 3 months ago

That's not quite what I'm saying. You're listing them as three different tasks, but there's in fact just one.

It does not make sense to prioritize those separately, because they are not separate issues.

Gotrethen commented 3 months ago

Ok. I guess you're right. Carry on then.

Gotrethen commented 3 months ago

Here's my thoughts regarding how one would maximize their chance of success for such an application paradigm transition:

Planning and Analysis:

Clearly define goals and benefits: Articulate what you aim to achieve with the transition (scalability, performance, etc.). This guides decision-making throughout the process.
Thoroughly assess your existing CBA: Identify suitable components for conversion, potential challenges, and dependencies.
Choose the right framework: Consider project needs, complexity, scalability, and your own experience level.
Start small and iterate: Don't attempt a big-bang approach. Begin with small, well-defined components and gradually evolve.

Implementation and Monitoring:

Leverage framework documentation and resources: Utilize tutorials, samples, and community support to navigate implementation details.
Focus on loose coupling and clear communication: Design processes with minimal dependencies and well-defined message exchange patterns.
Implement robust error handling and monitoring: Ensure processes can handle failures gracefully and provide insights for troubleshooting.
Continuously monitor performance and resource usage: Identify bottlenecks and adapt your approach as needed.

Additional Guidelines:

Prioritize data consistency and security: Implement mechanisms to maintain data integrity across processes and protect sensitive information.
Test thoroughly at each stage: Utilize unit testing, integration testing, and performance testing to ensure functionality and stability.
Seek expert guidance: If needed, involve experienced software architects or consultants familiar with PBAs and your chosen framework.
Measure progress and document learnings: Track your progress, evaluate outcomes, and document lessons learned for future reference.

Gotrethen commented 3 months ago

C# takes care of a lot of this stuff for us already, so arguably a lot of what is written there is fairly redundant. And if other system components are needed, they can be attached as well. But decoupling seems to be the main challenge here.

ProbablePrime commented 3 months ago

Thanks for your thoughts! We'll take care of it once the team starts work on this item.

Once we start work we'll update this item and mark it as in progress.

Gotrethen commented 3 months ago

Classes still play an important role in the PBA, but they would likely need to adapt. Here are some key strategies for a smooth paradigm transition:

1. Refactor for Process Boundaries:

Identify classes representing independent tasks or functionalities with clear inputs and outputs.
Refactor them into separate processes, encapsulating logic and data within process boundaries.

2. Reduce Hidden Information (But Not Entirely):

Don't eliminate data hiding completely, but minimize it where possible.
Focus on reducing opacity for elements crucial to external interaction and PBA collaboration.

3. Prioritize Transparency for Collaboration:

Define well-structured interfaces for communication between processes using message queues, APIs, or other IPC mechanisms.
Consider the Facade pattern to expose necessary functionality without revealing internal complexity.

4. Embrace Composition over Inheritance:

When creating new classes for PBA purposes, favor composition of smaller, well-defined modules with controlled interfaces.
This reduces hidden information inherited from base classes and promotes loose coupling.

5. Leverage Design Patterns and Best Practices:

Explore Domain-Driven Design (DDD) for modeling classes based on real-world domains and responsibilities, fostering clear boundaries and reducing hidden information naturally.
Utilize patterns like Command Query Responsibility Segregation (CQRS) to separate data access and manipulation, enhancing scalability and PBA integration.

6. Start Small and Iterate:

Don't attempt a massive overhaul. Begin with small, well-defined components and gradually evolve them into processes, learning from each iteration.
This mitigates risks and allows you to refine your approach based on specific challenges and your application's needs.

7. Maintain Testability and Documentation:

Ensure your classes remain testable, even with some hidden information.
Provide clear documentation describing their purpose, exposed interfaces, and any critical internal details for future maintainability.

Gotrethen commented 3 months ago

Thanks for your thoughts! We'll take care of it once the team starts work on this item.

Once we start work we'll update this item and mark it as in progress.

Sure, thanks for your input as well. I'm done browsing this issue right now, so I'll come back to it later.

Gotrethen commented 3 months ago

Well, I've done my own research on this issue. As Prime mentioned, the team will take over once they start work on this.

Yellow-Dog-Man / Resonite-Issues