FDio / govpp

Go toolset for the VPP.
Apache License 2.0
193 stars 82 forks source link

Trace refactored #116

Closed VladoLavor closed 6 months ago

VladoLavor commented 1 year ago

The way how the API trace is initialized was changed - the trace is no longer initialized (and immediately disabled) by the connection. Instead, the user creates a new trace object and bounds it to a connection.

API changes:

Workflow changes:

Signed-off-by: Vladimir Lavor vlavor@cisco.com

ondrej-fabry commented 1 year ago

I have tested this locally repeatedly (-count=10000) with race checker enabled (-race) and it passed OK, so I assume this will resolve the issue with intermittent unit test failures.

However I have a few comments:

  1. The Tracer usage makes me a bit uneasy, it is referenced from the trace field of Connection, which is handled without any lock and could potentially run into race issues when the trace records are being created.
  2. The API of Tracer does seem a bit limiting. For example, what about the overwriting of records? It might be more important for the user to have the last N trace records when running into some issue. Otherwise it is needed to continually call Clear or re-create the tracer, while user cannot reliably know when the records storage is full.
  3. The Tracer always stores all of the requests occurring on the connection, which can easily pollute with records uninteresting to user (e.g. keepalive pings, other non-important channels, ..), since the filtering of records for a specific channel happens when user retrieves the records and not when they are being stored
  4. The Record is missing some info about the requests/replies which might be important to the user, to be specific:
    • request context - used for multiplexing multiple parallel requests on connection, it actually consists of these attributes: sequenceNumber + channelId + isMultiRequest
    • message ID - identifies the message (might be different on each VPP run)
    • data length - how much data is being sent/received
    • data - the actual data sent/received (might be useful for debugging a bug in encoding/decoding)

@sknat could you take a look and do a quick review as well?

VladoLavor commented 1 year ago

@ondrej-fabry here are some of my thoughts:

  1. Do you mean the issue that should be fixed by this patch? The trace manages all locks by itself.
  2. I agree with this point. The initial tracer was not implemented with all the bells and whistles and its API could be much more versatile.
  3. Valdi point. The per-channel filter was based on the specific scenario where all records were needed at first and later sorted by channel. But the sorting can be done easily by the user if needed. Filtering should be made before messages are stored.
  4. +1
sknat commented 9 months ago

Following up on the discussion we had last week - sorry for the really long review cycle - and after having played with it quite a bit I guess my position so far align: 1) I think I can also see edge cases happening where you would register a tracer with API messages in flight i.e. receive a message for which we didn't Add() the WaitGroup. 2) Also aligned here, although it is still unclear to me whether we should aim at having a tracer implementation with bells and whistles in this repository or if we should just export a channel and let users implement their own traces. 3) Filtering might not be required in all cases, although I am under the impression the current implementation is rather specific (i.e. asserting a certain number of messages in advance).

That said I think we can move ahead with this patch as is, and evolve it when adding other usecases.