Closed thomaseizinger closed 5 months ago
- I was under the impression that a discarded candidate would never be nominated? That seems to be happening at the moment.
Yes, that seems to be what's happening.
- I think re-adding a candidate after it has been discarded should "activate" it again.
Agree. It's a scenario I haven't anticipated, but I don't think there should be any problem removing the discarded state and re-run the following code to make it appear in candidate pairs again.
- We use indices to candidates in a lot of places at the moment, hence we cannot remove discarded candidates from the list. Would you be okay with adding a dependency like
slab
to have stable indices and we able to actually remove candidates instead of making them stateful? I have a feeling this state management is a lost battle and will trigger more bugs.
I don't see how Slab
will help us here. For better or worse, IceAgent
is built around indexes and whether that index points to an empty slab slot or a Candidate
with a discarded state flag, I think will end up the same roughly the same.
I think an alternative impl would use LinkedList<Candidate>
and assign an ID or even use Arc<Candidate>
so that a CandidatePair
can keep a direct reference. However I chose this path due to what I hoped would be performant (doing direct index lookups).
I don't think this bug will be that hard to fix, and it affects the way we use str0m too. I'll have a look now.
I don't see how
Slab
will help us here. For better or worse,IceAgent
is built around indexes and whether that index points to an empty slab slot or aCandidate
with a discarded state flag, I think will end up the same roughly the same.
I thought the issue with removing candidates from the list is that the remaining indices would be invalidated because the others are moved forward in memory.
With a slab (afaik), we can add and remove items without invalidating other indices. Invalidating other things like candidate pairs when a candidate is discarded is orthogonal to that and has to happen already today.
The difference with a slab would be that we can actually remove this candidates and don't have to filter them.
The difference with a slab would be that we can actually remove this candidates and don't have to filter them.
Sure, but that seems to be more of a convenience issue, i.e. list.iter().filter(|c| !c.discarded)
vs slab.iter()
, which is already solved in the current impl.
The difference with a slab would be that we can actually remove this candidates and don't have to filter them.
Sure, but that seems to be more of a convenience issue, i.e.
list.iter().filter(|c| !c.discarded)
vsslab.iter()
, which is already solved in the current impl.
It is easy to forget to do this filtering though as we can see from this bug so I am not sure labelling it a convenience issue is accurate.
Isn't it also technically a memory-leak to never free up candidates internally dispite the user calling invalidate_candidate
?
Here's another exciting aspect of manually discarding candidates.
As it stands, we do nothing with A, and I wonder if that is good or not?
invalidate_candidate
is our own API, and the spec compliant way is ice_restart
, so I don't want too much code gymnastics to solve these issues.
It is easy to forget to do this filtering though as we can see from this bug so I am not sure labelling it a convenience issue is accurate.
Well. It's an internal concern.
Isn't it also technically a memory-leak to never free up candidates internally dispite the user calling
invalidate_candidate
?
Slab is fixed space, so we wouldn't free any memory by removing a slab entry. ice_restart
is the official API that would normally clean out the entire state. In some respects these are all compound problems on invalidate_candidate()
– not saying it isn't fixable, but there's only so much code I want to add to solve a non-standard API.
Isn't it also technically a memory-leak to never free up candidates internally dispite the user calling
invalidate_candidate
?Slab is fixed space, so we wouldn't free any memory by removing a slab entry.
ice_restart
is the official API that would normally clean out the entire state. In some respects these are all compound problems oninvalidate_candidate()
– not saying it isn't fixable, but there's only so much code I want to add to solve a non-standard API.
Okay, fair point :)
Hey @thomaseizinger you were invalidating remotes right?
I just realised that invalidating a remote doesn't necessarily shut down the traffic from that IP:
2024-04-28T08:04:11.993143Z TRACE str0m::ice_::agent: Discard pairs for remote candidate index: 0
2024-04-28T08:04:11.993174Z TRACE L: str0m::ice_::agent: Handle receive: StunMessage { method: Binding, class: Request, attrs: Attributes { username: "1nYb:SRio", message_integrity: [21, 57, 70, 19, 126, 38, 111, 148, 69, 253, 122, 191, 214, 27, 90, 18, 203, 96, 124, 186], fingerprint: 842275725, priority: 1862270719, ice_controlled: 15672487014400784862 }, integrity_len: 56 }
2024-04-28T08:04:11.993270Z TRACE L: str0m::ice_::agent: Message accepted
2024-04-28T08:04:11.993291Z INFO L: str0m::ice_::agent: Created peer reflexive remote candidate from STUN request: Candidate(prflx=2.2.2.2:1000/udp prio=1862270719)
2024-04-28T08:04:11.993313Z DEBUG L: str0m::ice_::agent: Created new pair for STUN request: CandidatePair(0-1 prio=7998391838664818175 state=Waiting attempts=0 unanswered=0 remote=0 last=None nom=None)
This would happen if there's a "straggler" UDP packet coming from a remote that we invalidated. Since the message is accepted (the STUN credentials are correct), and there isn't a current remote candidate for it, we auto-create a peer reflexive.
Does that have any impact on your app logic?
I don't think so. Do we test peer-reflexive candidates? I've seen it in the logs when roaming and I thought it waits until an actual candidate gets added.
Perhaps a peer-reflexive candidate should not be created if there is a discarded candidate?
@thomaseizinger yeah. That's what I implemented in #508 – if there is a matching discarded, we don't add a peer reflexive
@thomaseizinger yeah. That's what I implemented in #508 – if there is a matching discarded, we don't add a peer reflexive
Just saw that :)
So does #508 also address not nominating discarded candidates? Or do we need another filter somewhere?
So does #508 also address not nominating discarded candidates? Or do we need another filter somewhere?
I think it might address it for the referenced case where the prev discarded candidate gets added again. But for defense-in-depth reasons, we might want to add another check somewhere that any candidate we nominate is definitely not discarded.
I think it might address it for the referenced case where the prev discarded candidate gets added again. But for defense-in-depth reasons, we might want to add another check somewhere that any candidate we nominate is definitely not discarded.
We can add a check, but I think it's rather redundant. It's a pair we nominate, not a candidate (although looking at the code, I think that detail is a bit confused, at least in the logging). And there are three places where pairs can form (add_remote_candidate
and add_local_candidate
and when making a dynamic peer-reflexive
on a STUN request). All those three seem to be guarded against using discarded.
But I don't mind a double check.
We can add a check, but I think it's rather redundant.
Yes, it would be more of a defense-in-depth thing in case we ever add another place where pairs are formed.
PR welcome!
We running into some state mismatch bugs with discarded candidates. Original issue is: https://github.com/firezone/firezone/issues/4736.
What seems to happen is that, after a candidate is discarded, it cannot be added again because it is considered redundant although the old candidate is marked as discarded and thus excluded from certain "searches" for candidates.
For example, in our code, we try to find the nominated candidate after it has been nominated which currently results in a panic because there we filter for discarded candidates.
I think the root cause for this (as touched on in https://github.com/algesten/str0m/pull/497) is that
Candidate
is stateful. Couple of questions:slab
to have stable indices and we able to actually remove candidates instead of making them stateful? I have a feeling this state management is a lost battle and will trigger more bugs.