Igalia / snabb

Snabb Switch: Fast open source packet processing
Apache License 2.0
47 stars 5 forks source link

intel_mp app instantiations make stats symlink counter forest #1179

Closed wingo closed 5 years ago

wingo commented 5 years ago

This commit changes the intel_mp driver to make a forest of symlinks to the shared stats counters, rather than a single link to the shared counters directory. This allows the app to indicate which queue it's on, as well as allowing ptree and top to naturally track the set of counters used by the app so they can manage associated .rrd files.

As part of this change, we remove the instance-local "pci" directory, instead using per-app "pci" directories. This allows us to more nicely record which app is on which device.

This commit also updates "snabb top" to get PCI interfaces from within apps' SHM frames.

Fixes #1178.

WDYT @takikawa @dpino ?

wingo commented 5 years ago

By the way, when you skim back in time for a network function that's live and receiving traffic, it shows some slight counter fluctuation. This is because the last-sample time keeps advancing, I think. It's a bit confusing and should be fixed eventually but it's not harmful.

eugeneia commented 5 years ago

IIUC this will require the old snabb top (new snabb ps) to add special casing to read device statistics? Currently, it is possible to list device statistics via snabb top -l pci/<addr> <pid>. Have to think about how to query non-process-bound resources...

eugeneia commented 5 years ago

On another note, I think having an intel_mp specific directory is a step back on our quest for driver consolidation.

How about the directory was called /var/run/snabb/pci, and master processes would symlink /var/run/snabb/pci/<addr> -> /var/run/snabb/<pid>/group/pci/<addr>? I think that would solve both of my concerns.

Edit: Actually, if the directory name is required for the new snabb top to deduce the pci/app relationship, I would even be fine if its /var/run/snabb/intel-mp/<addr> as long as its linked to .../<pid>/group/pci/<addr>.

takikawa commented 5 years ago

It looks like get_rxstats / get_txstats won't quite work with these changes since the shm frame is moved from a pci/... folder to an app-specific one. The other thing is that the app-specific shm frame doesn't have working entries for shared counters since the create method doesn't return an appropriate object for reading. PR #1180 might be a way to fix this.

wingo commented 5 years ago

@eugeneia regarding "snabb ps", I don't know that there is any significant change -- you get the statistics under the app/ directory of the Snabb instance, and also in the global intel-mp directory, but not in the per-Snabb-instance pci/ directory.

wingo commented 5 years ago

I have no idea why these tests are failing :((( Deep flakiness :(((

takikawa commented 5 years ago

@wingo Re: the latest test failures, my theory after poking around a bit is that one problem is with the return counter.open(target(name)) line in my commit 79d45461171f4ee34b4d0ed064e1f8941b7cd87c.

The issue with that call is that the target may not actually be created yet, if the snabb apps get stopped and restarted (as happens in the snabbnfv selftest).

For example, say we restart the stat collecting NIC app. When it closes, it will delete the shm frame for the stats and relinquish its stat collection status. When it comes back up, it may see that the lock is still held for the NIC (because of another NIC app) and won't start back up as stat collector.

In that case, the self.shm counters can't actually open the symlinked targets (since they were deleted and not recreated) and the calls fail.

Basically, the interaction between stat collection and restarting apps seems messy right now. I think we could change the line to only open the counter if it shm.exists() (and otherwise return an object that will error when used or something), or maybe make get_rxstats read the counter a different way.

wingo commented 5 years ago

Latest commit avoids eagerly opening the counters. Let's see what happens!

wingo commented 5 years ago

yes!!! @takikawa wdyt?