caio / foca

mirror of https://caio.co/de/foca/
https://caio.co/de/foca/
Mozilla Public License 2.0
147 stars 8 forks source link

Should I worry about "Member not found" warning log? #10

Closed jeromegn closed 2 years ago

jeromegn commented 2 years ago

We're using an fat identity with a bump.

pub struct Actor {
    id: ActorId,
    name: ActorName,
    addr: SocketAddr,
    group: String,

    // An extra field to allow fast rejoin
    bump: u16,
}

and our Identity implementation:

impl Identity for Actor {
    fn has_same_prefix(&self, other: &Self) -> bool {
        // sometimes the ID can be nil, when we connect to a member,
        // we don't know its ID, we only know its address
        // the ID should be updated later on, at least I hope
        if other.id.is_nil() || self.id.is_nil() {
            self.addr.eq(&other.addr)
        } else {
            self.id.eq(&other.id)
        }
    }

    fn renew(&self) -> Option<Self> {
        Some(Self {
            id: self.id,
            name: self.name.clone(),
            addr: self.addr,
            group: self.group.clone(),
            bump: self.bump.wrapping_add(1),
        })
    }
}

We're handling members like:

match notification {
        Notification::MemberUp(actor) => {
            let added = { members.write().add_member(&actor) };
            info!("Member Up {actor:?} (added: {added})");
            if added {
                // actually added a member
                // notify of new cluster size
                if let Ok(size) = (members.read().0.len() as u32).try_into() {
                    foca_tx.send(FocaInput::ClusterSize(size)).ok();
                }
            }
            tokio::spawn(write_sql(write_sql_sender, move |pool| async move {
                if let Err(e) = upsert_actor_name(&pool, actor.id(), actor.name().as_str()).await {
                    warn!("could not upsert actor name: {e}");
                }
            }));
        }
        Notification::MemberDown(actor) => {
            let removed = { members.write().remove_member(&actor) };
            info!("Member Down {actor:?} (removed: {removed})");
            if removed {
                // actually removed a member
                // notify of new cluster size
                if let Ok(size) = (members.read().0.len() as u32).try_into() {
                    foca_tx.send(FocaInput::ClusterSize(size)).ok();
                }
            }
        }
// ...
}

I started listing foca members via iter_members() to try and debug this.

Some logs I see when restarting:

2022-07-22T16:14:38.007685Z [INFO] Current Actor ID: 385107b4-3423-4b2e-b4e8-637cfac5779e
2022-07-22T16:14:41.036757Z [INFO] Member Up Id(ActorName("ab03"), ActorId(9c84ee68-d7db-427a-9a08-b8c7d501e856), <ip>:7878, bump: 17951) (added: true)
2022-07-22T16:14:41.036810Z [INFO] Current node is considered ACTIVE
2022-07-22T16:14:41.036832Z [INFO] foca knows about: Id(ActorName("ab03"), ActorId(9c84ee68-d7db-427a-9a08-b8c7d501e856), <ip>:7878, bump: 17951)
2022-07-22T16:14:42.103568Z [INFO] Member Up Id(ActorName("2f55"), ActorId(67c1bd17-a13b-4c72-8e81-63c6a5b82977), <ip>:7878, bump: 46452) (added: true)
2022-07-22T16:14:42.103698Z [INFO] foca knows about: Id(ActorName("ab03"), ActorId(9c84ee68-d7db-427a-9a08-b8c7d501e856), <ip>:7878, bump: 17951)
2022-07-22T16:14:42.103736Z [INFO] foca knows about: Id(ActorName("2f55"), ActorId(67c1bd17-a13b-4c72-8e81-63c6a5b82977), <ip>:7878, bump: 46452)
2022-07-22T16:14:43.011694Z [INFO] Member Up Id(ActorName("09cd"), ActorId(b129db5e-650e-40c3-9046-852ac72ed1ef), <ip>:7878, bump: 57964) (added: true)
2022-07-22T16:14:43.011765Z [INFO] foca knows about: Id(ActorName("ab03"), ActorId(9c84ee68-d7db-427a-9a08-b8c7d501e856), <ip>:7878, bump: 17951)
2022-07-22T16:14:43.011779Z [INFO] foca knows about: Id(ActorName("09cd"), ActorId(b129db5e-650e-40c3-9046-852ac72ed1ef), <ip>:7878, bump: 57964)
2022-07-22T16:14:43.011787Z [INFO] foca knows about: Id(ActorName("2f55"), ActorId(67c1bd17-a13b-4c72-8e81-63c6a5b82977), <ip>:7878, bump: 46452)
2022-07-22T16:14:47.527714Z [INFO] Member Up Id(ActorName("1337"), ActorId(69fddfaf-b618-452e-829b-897e54c91889), <ip>:7878, bump: 39791) (added: true)
2022-07-22T16:14:47.527816Z [INFO] foca knows about: Id(ActorName("ab03"), ActorId(9c84ee68-d7db-427a-9a08-b8c7d501e856), <ip>:7878, bump: 17951)
2022-07-22T16:14:47.527869Z [INFO] foca knows about: Id(ActorName("09cd"), ActorId(b129db5e-650e-40c3-9046-852ac72ed1ef), <ip>:7878, bump: 57964)
2022-07-22T16:14:47.527879Z [INFO] foca knows about: Id(ActorName("2f55"), ActorId(67c1bd17-a13b-4c72-8e81-63c6a5b82977), <ip>:7878, bump: 46452)
2022-07-22T16:14:47.527887Z [INFO] foca knows about: Id(ActorName("1337"), ActorId(69fddfaf-b618-452e-829b-897e54c91889), <ip>:7878, bump: 39791)
2022-07-22T16:14:50.424720Z [INFO] Member Up Id(ActorName("957f"), ActorId(9a925261-675a-4fa7-8478-ccf378b01d90), <ip>:7878, bump: 6158) (added: true)
2022-07-22T16:14:50.424831Z [INFO] foca knows about: Id(ActorName("ab03"), ActorId(9c84ee68-d7db-427a-9a08-b8c7d501e856), <ip>:7878, bump: 17951)
2022-07-22T16:14:50.424902Z [INFO] foca knows about: Id(ActorName("957f"), ActorId(9a925261-675a-4fa7-8478-ccf378b01d90), <ip>:7878, bump: 6158)
2022-07-22T16:14:50.424913Z [INFO] foca knows about: Id(ActorName("2f55"), ActorId(67c1bd17-a13b-4c72-8e81-63c6a5b82977), <ip>:7878, bump: 46452)
2022-07-22T16:14:50.424922Z [INFO] foca knows about: Id(ActorName("1337"), ActorId(69fddfaf-b618-452e-829b-897e54c91889), <ip>:7878, bump: 39791)
2022-07-22T16:14:50.424930Z [INFO] foca knows about: Id(ActorName("09cd"), ActorId(b129db5e-650e-40c3-9046-852ac72ed1ef), <ip>:7878, bump: 57964)
2022-07-22T16:14:53.231006Z [INFO] Member Up Id(ActorName("2f55"), ActorId(67c1bd17-a13b-4c72-8e81-63c6a5b82977), <ip>:7878, bump: 57499) (added: false)
2022-07-22T16:15:11.043867Z [INFO] Member Up Id(ActorName("1337"), ActorId(69fddfaf-b618-452e-829b-897e54c91889), <ip>:7878, bump: 36078) (added: false)
2022-07-22T16:15:31.041785Z [INFO] Member Down Id(ActorName("2f55"), ActorId(67c1bd17-a13b-4c72-8e81-63c6a5b82977), <ip>:7878, bump: 46452) (removed: false)
2022-07-22T16:15:41.110738Z [INFO] Member Down Id(ActorName("1337"), ActorId(69fddfaf-b618-452e-829b-897e54c91889), <ip>:7878, bump: 36078) (removed: false)
2022-07-22T16:16:06.048567Z [WARN] foca: Member not found

Should we be worried about the Member not found log message or are these normal? They don't seem to appear repeatedly. This could be a race between 2 notifications?

caio commented 2 years ago

Thanks a lot for the very detailed report!

There's nothing to worry about: this happens when the instance suspects another to be down and then learns that it wasn't down after all.

It was supposed to be a debug trace (then the span would contain useful metadata), sorry about the noise! I'll ship a patch release for this and close the issue when done.