berarma / ffbtools

Set of tools for FFB testing and debugging on GNU/Linux
GNU General Public License v3.0
56 stars 10 forks source link

Add spam protection option #17

Closed mcoffin closed 3 years ago

mcoffin commented 4 years ago

This pull request adds a --spam-protection option to ffbwrap, that causes the ioctl to ignore any requests to repeatedly update the same effect ID with the same parameters.

Games like Assetto Corsa Competizione do this while the game is paused, spamming updates when the first one would suffice.

The result is that the queue on the kernel side grows too large, and events begin dropping. This tweak can fix that for those titles.

Related issues:

FWIW, with this tweak, I was able to play for hours with my G920 with no issues, whereas before I had to be ultra careful not to ever pause the game for more than a few seconds, so that events would not be dropped.

While the real issue might be on the proton side, this is still a useful tweak for this temporary stop-gap repo, as that's what this is all about.

Thanks for the time, Matt

berarma commented 4 years ago

Thanks! That's cool!

This is a known problem with Logitech wheel drivers in the kernel (HID and HID++). The game expects the device driver to know what to do with the excess messages, at least on Windows it does but not on Linux.

Do you mind discussing the changes before merging?

I've written a new driver for the Logitech HID wheel devices that adds missing functionality and also fixes this problem, but not for the G920 because it uses another driver (HID++ module). I only own a G29 wheel. Thus, I think this should be fixed in the HID++ driver, but this is the right place to start testing a solution.

What I did in new-lg4ff is using a timer to send FFB commands to the device so that I could control the rate at which they're sent. The newer commands for the same effect slot overwrite the older ones so instead of discarding new commands only when they're the same, I discard all older commands because they were updated before the wheel had time to even react to them. They're obsolete by the time the new command arrives.This fixes your issue but also a similar problem in ETS2/ATS and possible effect latencies in other games.

The struct comparison wouldn't be needed, instead we set a timer function with a preset period around what the device can handle (2ms) and send the effect commands from there.

If this solution was first implemented here and it proved to work as well for the G920 as it did for the other Logitech wheels we could then think about moving the implementation to the HID++ driver.

I would avoid doing memory handling in the IOCTL calls. It can hurt performance specially if they're being called at a high rate. There's an IOCTL call that returns the number of effect slots. We should initialize the effect history with that length size so we don't need to do alloc/free/memcpy later.

What do you think?

mcoffin commented 4 years ago

@berarma I think that's definitely another good way to solve the issue, but the issue with that implementation (buffering and sending asynchronously to the device at ~2ms/event) is that with how many events ACC sends while they game is paused, you're going to end up piling up events in memory, and it takes time to catch up, so FFB is broken for a few seconds after you un-pause.

While I agree that this fix likely belongs in the driver, this is a great place to send the tweak out first for people to solve problems while I port the fix upstream.

As far as memory management, I've never seen the current implementation allocate more than just that first time, to create the default sized (512 event) buffer, so there hasn't been any performance impact there.

With the ioctl call you mentioned though, I'll look in to using that instead of just using the highest effect ID that we've encountered, but that will only matter in cases where there are >512 effects, which I've never found to be the case across any of the following games.

Unfortunately, though, this fix does not yet fix this issue for beam.ng, so I have so more digging to do there since that seems to be a separate issue.

So to summarize...

  1. yes, this eventually belongs in the driver, but adding here will help people test and fix the issue immediately
  2. I don't quite agree with the 2ms async queue implementation, but lets talk about it if you still feel strongly about it. I'm quite in favor of the struct compare implementation, since if the game behaves quite poorly like ACC, there will be no latency hit once it starts sending "real" updates again.
  3. The memory handling here has never (for me) allocated more than once, but maybe we can use the ioctl to more intelligently guess the number of events we will need in the buffer. The over-allocation of 512 events seems to work fine with a margin of about maximum 50% usage in all titles I've tried.
mcoffin commented 4 years ago

Actually, I just spotted one way to optimize the memory allocation in the case where we do go over 512 events (which I've never seen). We can at least try to use realloc if events_history already exists and we're trying to increase it's size. That would save us one memcpy in that case.

Again, though, 99.9% of the time, I wouldn't expect to see any more than the first memory allocation.

mcoffin commented 4 years ago

@berarma I added an FFBTOOLS_EFFECT_HISTORY_EXPONENTIAL_GROWTH=yes option to allow for exponential growth, and refactored the reallocation path for the rare case of out-growing the default buffer size to use reallocarray to save us a memcpy step on that rare code path. See: e242f0a.

berarma commented 4 years ago

@berarma I think that's definitely another good way to solve the issue, but the issue with that implementation (buffering and sending asynchronously to the device at ~2ms/event) is that with how many events ACC sends while they game is paused, you're going to end up piling up events in memory, and it takes time to catch up, so FFB is broken for a few seconds after you un-pause.

My solution isn't creating another queue, instead, I avoid the existing kernel queue by throtling the requests. This is a more generic way of fixing this problem so that it doesn't only work when the same effect is sent over and over again but also when the effect is really updated very often.

I store the last command for every effect slot so I know the state of every slot and only send the last command at the configured rate. If the game is sending 10 times the commands that the device can handle, 9 out of 10 commands will be dropped but that's no problem because we're sending the last one.

As far as memory management, I've never seen the current implementation allocate more than just that first time, to create the default sized (512 event) buffer, so there hasn't been any performance impact there.

I didn't read the code correctly. You're right, but you don't need to allocate that much memory. I think the G920 uses 64 slots but it would be a good thing making the code generic enough to work on any device.

With the ioctl call you mentioned though, I'll look in to using that instead of just using the highest effect ID that we've encountered, but that will only matter in cases where there are >512 effects, which I've never found to be the case across any of the following games.

It would improve memory use (although by a little bit) and make the allocation code simpler.

Unfortunately, though, this fix does not yet fix this issue for beam.ng, so I have so more digging to do there since that seems to be a separate issue.

You're most probably coming across the buffering issue that I've explained above. That's why I think we should aim for the more generic solution I've implemented in new-lg4ff. That would fix all buffering issues.

So to summarize...

1. yes, this eventually belongs in the driver, but adding here will help people test and fix the issue immediately

Yes.

2. I don't quite agree with the 2ms async queue implementation, but lets talk about it if you still feel strongly about it. I'm quite in favor of the struct compare implementation, since if the game behaves quite poorly like ACC, there will be no latency hit once it starts sending "real" updates again.

I've explained above. I must say it's not an hipothetic solution, it has been tested and it's working in new-lg4ff. I even check the size of the kenel queue to avoid queueing messages because it's better loosing old updates than growing the queue. Native games like ETS2/ATS can send an insane amount of effect updates. It has to be throttled or the queue starts sending delayed commands to the device, or even missing them when the queue is full.

3. The memory handling here has never (for me) allocated more than once, but maybe we _can_ use the `ioctl` to more intelligently guess the number of events we will need in the buffer. The over-allocation of 512 events seems to work fine with a margin of about maximum 50% usage in all titles I've tried.

Not really a problem but it would be convenient since we can know exactly how much memory we're going to need.

berarma commented 3 years ago

Hi.

I'd like to implement something like this but there's some issues that I think should be addressed in this PR.

First, it's only valid for a certain case where the game sends a lot of identical updates to a playing effect. The same issue happens even when the commands aren't the same but it doesn't fix those. Also, it can break cases where the command is the same but it's still needed because the previous one has finished playing.

And a small issue: it only accounts for effects being removed by a RMFF command while they could also be removed when reopenning the device node.

The way I would go to be more generic is delaying the commands for a parametrized time (like 2ms), this way we can overwrite older commands with the newer ones before sending them to the device driver. This would require setting a timer to send the pending commands that are stored in a buffer.

Anyone?

mcoffin commented 3 years ago

First, it's only valid for a certain case where the game sends a lot of identical updates to a playing effect. The same issue happens even when the commands aren't the same but it doesn't fix those. Also, it can break cases where the command is the same but it's still needed because the previous one has finished playing.

Correct, I mostly included this to fix FFB for Assetto Corsa Competizione. I hadn't thought of the case where the effect is being repeated. I agree that there are some things to sort out given that case.

And a small issue: it only accounts for effects being removed by a RMFF command while they could also be removed when reopenning the device node.

Correct. hadn't even thought of that. Will be easy to account for, though.

The way I would go to be more generic is delaying the commands for a parametrized time (like 2ms), this way we can overwrite older commands with the newer ones before sending them to the device driver. This would require setting a timer to send the pending commands that are stored in a buffer.

Agreed that that's a much better way to solve the issue. I've seen problems where removing duplicates isn't even enough to stop overloading the driver, so this seems like the better problem. The only problem I see with this approach is the (small but possible) performance hit from having an async thread doing the "flushing" of the buffer if you will. And it would be not great at all if we had to have any locking going on, though I doubt that would be the case, atomic timestamps should be enough for the cross communication there.

Without an async timer, we'd be screwed if we skipped an important last instruction, but then didn't receive any other updates for a while, so to make a truly "correct" solution we'd have to bite the bullet and go async with the timer, yes?

berarma commented 3 years ago

Sorry for the delay. I have implemented this on PR #19 and tested that it doesn't break FFB. It would need someone to test that it really fixes the issue.

I've tried with/without locking and both ways it seems to work correctly. Locking will seldom happen since the involved code is pretty fast and doesn't run that often. When it happens it shouldn't affect performance for the same reason and because spinlocks are very light.

I don't think this will have a significant performance penalty and since it may greatly reduce the number of commands sent to the device it can improve performance in cases of extremely frequent effect updates.

mcoffin commented 3 years ago

@berarma

It would need someone to test that it really fixes the issue.

I'll try to get to this tonight (review code + test with ACC).

it can improve performance in cases of extremely frequent effect updates.

ACC does updates to a single effect at 333Hz, so we'll see how the performance goes with that one as it would be a pretty ideal case. Thanks for the work, and I'll try to get back to you soon, though it seems we've both stumbled across being pretty busy with other stuff :)

berarma commented 3 years ago

ACC does updates to a single effect at 333Hz, so we'll see how the performance goes with that one as it would be a pretty ideal case.

I've set the timer so the frequency is approximately at 333Hz. In this case there wouldn't be much advantage, it would have to be a much bigger frequency or more effects involved.

Thanks for the work, and I'll try to get back to you soon, though it seems we've both stumbled across being pretty busy with other stuff :)

Yes, very much so. Thanks for your help and interest.

mcoffin commented 3 years ago

Closing in favor of #19 and #20