I was looking into an issue we were seeing with only being able to receive one message at a time when setting up a policy for listening to DCGM notifications. I noticed some strange code that looks potentially buggy to me but it's likely Im just missing something.
The code in reference is here. I have also copied this below:
publisher := newPublisher()
_ = publisher.add()
_ = publisher.add()
// broadcast
go publisher.broadcast()
go func() {
for {
select {
case dbe := <-callbacks["dbe"]:
publisher.send(dbe)
case pcie := <-callbacks["pcie"]:
publisher.send(pcie)
case maxrtpg := <-callbacks["maxrtpg"]:
publisher.send(maxrtpg)
case thermal := <-callbacks["thermal"]:
publisher.send(thermal)
case power := <-callbacks["power"]:
publisher.send(power)
case nvlink := <-callbacks["nvlink"]:
publisher.send(nvlink)
case xid := <-callbacks["xid"]:
publisher.send(xid)
}
}
}()
// merge
violation = make(chan PolicyViolation, len(channels))
go func() {
for _, c := range channels {
val := <-c
violation <- val
}
close(violation)
}()
There is some missing context here, but this is the important part. What I see happening is that the channels in the callbacks array are being read in two places at the same time: (1) in the go routine with the select statement (2) in the "merge" go routine with the for/range loop. This seems odd to me. Go doesnt duplicate messages in a channel for multiple readers so these two routines would appear to be fighting for the same messages as far as I can tell. Furthermore, if the "select" routine gets the messages, it will send them to the publisher, which appears to publish to two subscribers that are not being listened to (What exactly is going on there? Im not sure). The main issue I see is the duplicate simultaneous reading from the same channels
I would like to know if I am missing something here. It seems that notifications could get lost if the select statement receives them and they get sent to the publisher that has unused subscribers. Am I understanding things correctly?
Hi Team,
I was looking into an issue we were seeing with only being able to receive one message at a time when setting up a policy for listening to DCGM notifications. I noticed some strange code that looks potentially buggy to me but it's likely Im just missing something.
The code in reference is here. I have also copied this below:
There is some missing context here, but this is the important part. What I see happening is that the channels in the callbacks array are being read in two places at the same time: (1) in the go routine with the select statement (2) in the "merge" go routine with the for/range loop. This seems odd to me. Go doesnt duplicate messages in a channel for multiple readers so these two routines would appear to be fighting for the same messages as far as I can tell. Furthermore, if the "select" routine gets the messages, it will send them to the publisher, which appears to publish to two subscribers that are not being listened to (What exactly is going on there? Im not sure). The main issue I see is the duplicate simultaneous reading from the same channels
I would like to know if I am missing something here. It seems that notifications could get lost if the select statement receives them and they get sent to the publisher that has unused subscribers. Am I understanding things correctly?