Automated Testing - Githubissues

ruipin commented 5 years ago

There has been some discussions about some sort of automated testing of RPCS3.

One option on the table is to have Cell integration tests, which are built from templates, get run with the null renderer, and the console output is then compared against a golden reference. Additionally, something similar to Dolphin's FifoCL (https://fifoci.dolphin-emu.org/) for testing the RSX is a possibility.

I've decided to create a tracking issue for the things that would be needed to get this sort of automated testing up-and-running.

[ ] Proper CLI support
- [x] #6110 Allow RPCS3 to run headless through the CLI (no UI)
- [x] CLI parameter to boot an ELF directly (or an RSX capture)
- [ ] CLI parameter to exit on stop (either ELF closes itself, or crash/fatal error)
- [x] CLI parameter to pass custom config yaml path (in order to be able to easily change this in a test-by-test basis)
- [ ] CLI parameter to redirect log files
[ ] Frame dumping capability (all frames, or specific frames) and corresponding CLI parameter(s) (@kd-11 will be adding the infrastructure for this in the RSX side, will need to be connected later)
[ ] Improve RSX Captures
- [x] Need to be portable/reproducible across different emulator versions and/or renderers (@kd-11 says that should already be the case)
- [ ] Compression support
[ ] Some sort of RPC language that we can use to control RPCS3 from another application, able to change configurations, run ELFs, set breakpoints, take captures, etc

As for the testing framework, we would need something capable of at least the following:

[ ] Cell
- [ ] (Optional) Build ELF files from source (using e.g. PSL1GHT)
- [ ] Run ELF files in RPCS3, collect results
- [ ] Parse results, compare with golden reference
- [ ] (Optional) Support templated test source/golden reference files
[ ] RSX
- [ ] Run RSX captures, collect frame dumps
- [ ] Compare frame dumps with golden reference images (or simply previous dumps?), highlight differences

Some good-to-haves would be:

[ ] Automatic bisecting given a specific failing test. Possibly using a supplied folder of pre-built RPCS3 executables; a version range, with auto-download from the archive; or maybe even just a commit range
[ ] Allow users to easily run a set of triaging tests (e.g. to help debug HW issues)

With this implemented, we would have enough to run these tests in our local machines. We could then discuss how to go from there.

If anyone wishes to play around with this, @hcorion has setup PSL1GHT in travis:


just change the install commands here: https://github.com/RPCS3/rpcs3/blob/master/.travis.yml#L21
from
install: "docker pull rpcs3/rpcs3-travis-trusty:1.0"
to 
install: "docker pull rpcs3/rpcs3-travis-trusty:2.0"````

bevanweiss commented 4 years ago

(NOTE: please read below with salt... I've only been looking at rpcs3 for a week now... so perhaps I completely misunderstand lots about it)

Is this something that we need to consider at a few different 'levels'? I think Integration Tests are important, where we call rpcs3.exe with a command line, get it to load up some ELFs/SELFs/ISOs/PKGs etc, and make sure it loads and runs, displays a set of images, runs at a certain FPS, takes a simulated input to do something.. etc... but I think the issue with such a testing process is that setting up this whole 'Test Fixture' is very expensive, so it would be difficult to establish it and ensure consistency across 'lots' of tests.

My thoughts are that we might need to have a bit more introspection in the testing, i.e. Unit Testing. So for example we try to break up the sections, certain down to PPU / SPU / RSX, but further if we can. And then we test up a Test Fixture which instantiates each of these (in isolation if possible), and mock the other associated entries (i.e. the RSX needs to be fed data from PPU / SPU... so we need to mock up enough of these to push data to the RSX instance in question).

I'm unsure of whether people have particular Test Frameworks that they have dealt with before, but I think it would be worth considering something like CMakeTest or GoogleTest as a Test Framework.. any other ideas / preferences around such? (or even statements of why it might not be feasible to use such a framework with rpcs3).

So I'm thinking something like https://fifo.ci/about/, but where instead of use restricting the Mocking just for data->RSX, we can have another set of Mocking also be data->PPU, or data->SPUs. The thoughts here being that we could start by prioritising test cases for known failures (which hopefully wouldn't need incredibly deep understanding of RPCS3 / PS3), and then slowly build up known issue points as time progresses.

kd-11 commented 4 years ago

About automated testing - we can easily use RSX captures to recreate scenarios for RSX to run. This isn't the hard part. One of the limitations is that for something like OpenGL, you need a window to bind the GPU output to. For vulkan I have a software path in place that works around this that we can use. That being said, unlike CPUs, GPUs are not really guaranteed to work the same way, the result has to be analyzed visually. If you capture screenshots of the same scene on different GPUs, some colors will technically not be a 1:1 match. This rules out memory compare as a reliable way of executing these tests, unless a constant/static target is chosen, but given how fast GPU designs get updated this is a tad unreliable. Mocks are not going to be useful, the sheer amount of data you need to set up is asinine just to do something like render a single triangle. I have used gmock for a little over half a decade now so I can maintain the tests if someone is willing to write them, but I'm not sure they make sense for most of the project code as rpcs3 code is externally driven by applications written by others. Sequences of events become unpredictable because of that, but at least for basic functionality it is not a bad idea. Another factor to consider is that writing UTs is not trivial. In my experience it takes longer to write a UT than to fix a bug in the code, which negatively affects dev throughput on closing tickets. Ofc for corporate projects, you get paid either way so its worth eating the time loss if your project demands UTs for every fix. Whichever way I look at it, there are pros and cons, but it is good to have an open discussion about it and see where we can compromise and end up with something. Hooking up the renderer backends to the headless mode is on my roadmap, once CI/CD is ready for PPU/SPU I can look to enable it. Most of the work was done a few years ago, I just need to enable the damn thing.

bevanweiss commented 4 years ago

@kd-11 have you tested the RSX Capture / Open lately? I've had a go at it from the current master, and it appears 'broken'. The capture reports success, but on Open it:

Reports 0.5fps in title bar... which is a little confusing, but no big issue.
Displays a black render window, with 'Caching shaders...' for a while, then nothing further.
Logs messages say ·E 0:47:27.923028 {RSX [0x10213000]} RSX: Display queue was discarded while not empty! ·W 0:47:27.933315 {RSX Replay} sys_rsx: sys_rsx_context_attribute(context_id=0x55555555, package_id=0x1, a3=0x10000000, a4=0x10000000, a5=0x0, a6=0x0)
and when trying to close the render window, it throws a deadlock error and rpcs3 closes. Let me know if it works for you, if not, I can raise an issue for it.

I would have expected the Open RSX Capture to have presented a still frame of the view at the time of the RSX Capture (this was with Lightning Returns : FFXIII, so perhaps it's a weird case in RSX behaviour).

I absolutely agree that we can't have only Unit Tests, as you say, rpcs3 is driven by external code (that we don't control, or know what it will do, including potentially self modifying). But, likewise, we can't test every code change against every PS3 game on every enduser platform. If we had some segmented tests, similar to what is in the autotest, but which doesn't rely on always building up the full rpcs3 instance every time, then I think that would be incredibly helpful. It wouldn't test for everything... but it could be capable of running tests a lot faster than the current autotest, and at a 'lower level' (but still providing the ability to just call rpcs3 fully if needed).

MSuih commented 4 years ago

Reports 0.5fps in title bar... which is a little confusing, but no big issue.

The screen has to be refreshed every now and then, otherwise the contents could get dirty and that would ruin the image. But because it's just supposed to show a static image, the rendering is limited to only displaying few frames every second.

As for the capture behavior, it is supposed to display the captured frame, yes. If it doesn't, then either there is a bug in the capture/replay system or the game is relying on SPU's or existing state from previous frame which cannot be replicated in a RSX capture.

kd-11 commented 4 years ago

Performance is low because all the memory consumed by a draw call is reset for every individual draw call. This invalidates the caches and slows things down. It doesn't need to be quick however. Set shader mode to single threaded when debugging to avoid skipping empty draw calls due to missing shaders. As for black renders, this is a sign that Cell intervention is required. If you capture the same games with full buffer readback options (write color + write depth), you will likely get the image rendered on the capture machine. Ofc it becomes useless to show a still image rendered offline when debugging as no matter what changes you make it will always be the same. Black color is just default when the data is missing. This pretty much means not all games can be captured and replayed, specifically AAA games where Cell rendering is common, this will never work.

kd-11 commented 4 years ago

I haven't experienced issue 3 in a few years and issue 4 mysteriously appeared randomly after some management code was updated outside RSX. Either way, behavior is going to change between captures as they are essentially mini-programs themselves, each of them will have different issues.

RPCS3 / rpcs3

Automated Testing #5312