envoyproxy / envoy-mobile

Client HTTP and networking library based on the Envoy project for iOS, Android, and more.
https://envoymobile.io
Apache License 2.0
562 stars 84 forks source link

Cronvoy: RequestFinishInfo -> Metrics #1520

Closed carloseltuerto closed 2 years ago

carloseltuerto commented 3 years ago

When a request concludes, the C++ Cronet Engine invokes a "native" Java method to expose some metrics. To implement this, I guess that adding a bunch of "x-envoy-metric-xyz" in the response header could be considered, as long as there is a response header in all circumstances (onSucceeded, onError, onCancel).

As far as I can tell, previous redirect metrics are discarded (otherwise, for example, dnsStartMs and dnsEndMs would not make sense.)

    /**
     * Called by the native code on the network thread to report metrics. Happens before
     * onSucceeded, onError and onCanceled.
     */
    @CalledByNative
    private void onMetricsCollected(long requestStartMs, long dnsStartMs, long dnsEndMs,
            long connectStartMs, long connectEndMs, long sslStartMs, long sslEndMs,
            long sendingStartMs, long sendingEndMs, long pushStartMs, long pushEndMs,
            long responseStartMs, long requestEndMs, boolean socketReused, long sentByteCount,
            long receivedByteCount) {
        synchronized (mUrlRequestAdapterLock) {
            if (mMetrics != null) {
                throw new IllegalStateException("Metrics collection should only happen once.");
            }
            mMetrics = new CronetMetrics(requestStartMs, dnsStartMs, dnsEndMs, connectStartMs,
                    connectEndMs, sslStartMs, sslEndMs, sendingStartMs, sendingEndMs, pushStartMs,
                    pushEndMs, responseStartMs, requestEndMs, socketReused, sentByteCount,
                    receivedByteCount);
        }
        // Metrics are reported to RequestFinishedListener when the final UrlRequest.Callback has
        // been invoked.
    }

And as a reference, here is the C++ header file: https://source.chromium.org/chromium/chromium/src/+/master:net/base/load_timing_info.h And here is what every metric looks like:

  /**
   * Metrics collected for a single request. Most of these metrics are timestamps for events during
   * the lifetime of the request, which can be used to build a detailed timeline for investigating
   * performance.
   *
   * <p>Events happen in this order:
   *
   * <ol>
   *   <li>{@link #getRequestStart request start}
   *   <li>{@link #getDnsStart DNS start}
   *   <li>{@link #getDnsEnd DNS end}
   *   <li>{@link #getConnectStart connect start}
   *   <li>{@link #getSslStart SSL start}
   *   <li>{@link #getSslEnd SSL end}
   *   <li>{@link #getConnectEnd connect end}
   *   <li>{@link #getSendingStart sending start}
   *   <li>{@link #getSendingEnd sending end}
   *   <li>{@link #getResponseStart response start}
   *   <li>{@link #getRequestEnd request end}
   * </ol>
   *
   * Start times are reported as the time when a request started blocking on event, not when the
   * event actually occurred, with the exception of push start and end. If a metric is not
   * meaningful or not available, including cases when a request finished before reaching that
   * stage, start and end times will be {@code null}. If no time was spent blocking on an event,
   * start and end will be the same time.
   *
   * <p>If the system clock is adjusted during the request, some of the {@link java.util.Date}
   * values might not match it. Timestamps are recorded using a clock that is guaranteed not to run
   * backwards. All timestamps are correct relative to the system clock at the time of request
   * start, and taking the difference between two timestamps will give the correct difference
   * between the events. In order to preserve this property, timestamps for events other than
   * request start are not guaranteed to match the system clock at the times they represent.
   *
   * <p>Most timing metrics are taken from <a
   * href="https://cs.chromium.org/chromium/src/net/base/load_timing_info.h">LoadTimingInfo</a>,
   * which holds the information for <a href="http://w3c.github.io/navigation-timing/"></a> and <a
   * href="https://www.w3.org/TR/resource-timing/"></a>.
   */
  public abstract static class Metrics {
    /**
     * Returns time when the request started.
     *
     * @return {@link java.util.Date} representing when the native request actually started. This
     *     timestamp will match the system clock at the time it represents.
     */
    @Nullable public abstract Date getRequestStart();

    /**
     * Returns time when DNS lookup started. This and {@link #getDnsEnd} will return non-null values
     * regardless of whether the result came from a DNS server or the local cache.
     *
     * @return {@link java.util.Date} representing when DNS lookup started. {@code null} if the
     *     socket was reused (see {@link #getSocketReused}).
     */
    @Nullable public abstract Date getDnsStart();

    /**
     * Returns time when DNS lookup finished. This and {@link #getDnsStart} will return non-null
     * values regardless of whether the result came from a DNS server or the local cache.
     *
     * @return {@link java.util.Date} representing when DNS lookup finished. {@code null} if the
     *     socket was reused (see {@link #getSocketReused}).
     */
    @Nullable public abstract Date getDnsEnd();

    /**
     * Returns time when connection establishment started.
     *
     * @return {@link java.util.Date} representing when connection establishment started, typically
     *     when DNS resolution finishes. {@code null} if the socket was reused (see {@link
     *     #getSocketReused}).
     */
    @Nullable public abstract Date getConnectStart();

    /**
     * Returns time when connection establishment finished.
     *
     * @return {@link java.util.Date} representing when connection establishment finished, after TCP
     *     connection is established and, if using HTTPS, SSL handshake is completed. For QUIC
     *     0-RTT, this represents the time of handshake confirmation and might happen later than
     *     {@link #getSendingStart}. {@code null} if the socket was reused (see {@link
     *     #getSocketReused}).
     */
    @Nullable public abstract Date getConnectEnd();

    /**
     * Returns time when SSL handshake started. For QUIC, this will be the same time as {@link
     * #getConnectStart}.
     *
     * @return {@link java.util.Date} representing when SSL handshake started. {@code null} if SSL
     *     is not used or if the socket was reused (see {@link #getSocketReused}).
     */
    @Nullable public abstract Date getSslStart();

    /**
     * Returns time when SSL handshake finished. For QUIC, this will be the same time as {@link
     * #getConnectEnd}.
     *
     * @return {@link java.util.Date} representing when SSL handshake finished. {@code null} if SSL
     *     is not used or if the socket was reused (see {@link #getSocketReused}).
     */
    @Nullable public abstract Date getSslEnd();

    /**
     * Returns time when sending the request started.
     *
     * @return {@link java.util.Date} representing when sending HTTP request headers started.
     */
    @Nullable public abstract Date getSendingStart();

    /**
     * Returns time when sending the request finished.
     *
     * @return {@link java.util.Date} representing when sending HTTP request body finished. (Sending
     *     request body happens after sending request headers.)
     */
    @Nullable public abstract Date getSendingEnd();

    /**
     * Returns time when first byte of HTTP/2 server push was received.
     *
     * @return {@link java.util.Date} representing when the first byte of an HTTP/2 server push was
     *     received. {@code null} if server push is not used.
     */
    @Nullable public abstract Date getPushStart();

    /**
     * Returns time when last byte of HTTP/2 server push was received.
     *
     * @return {@link java.util.Date} representing when the last byte of an HTTP/2 server push was
     *     received. {@code null} if server push is not used.
     */
    @Nullable public abstract Date getPushEnd();

    /**
     * Returns time when the end of the response headers was received.
     *
     * @return {@link java.util.Date} representing when the end of the response headers was
     *     received.
     */
    @Nullable public abstract Date getResponseStart();

    /**
     * Returns time when the request finished.
     *
     * @return {@link java.util.Date} representing when the request finished.
     */
    @Nullable public abstract Date getRequestEnd();

    /**
     * Returns whether the socket was reused from a previous request. In HTTP/2 or QUIC, if streams
     * are multiplexed in a single connection, returns {@code true} for all streams after the first.
     *
     * @return whether this request reused a socket from a previous request. When {@code true}, DNS,
     *     connection, and SSL times will be {@code null}.
     */
    public abstract boolean getSocketReused();

    /**
     * Returns milliseconds between request initiation and first byte of response headers, or {@code
     * null} if not collected.
     */
    @Nullable public abstract Long getTtfbMs();

    /**
     * Returns milliseconds between request initiation and finish, including a failure or
     * cancellation, or {@code null} if not collected.
     */
    @Nullable public abstract Long getTotalTimeMs();

    /**
     * Returns total bytes sent over the network transport layer, or {@code null} if not collected.
     */
    @Nullable public abstract Long getSentByteCount();

    /**
     * Returns total bytes received over the network transport layer, or {@code null} if not
     * collected. Number of bytes does not include any previous redirects.
     */
    @Nullable public abstract Long getReceivedByteCount();
  }
alyssawilk commented 2 years ago

@carloseltuerto throwing this one over to you - I think we can mark fixed once the cronet tests are passing

carloseltuerto commented 2 years ago

It turns out that EnvoyMobile receives exactly the number of expected bytes from the Socket.

I enabled the traces, and this line appears:

[2022-01-05 20:21:30.550][1971874][trace][connection] [external/envoy/source/common/network/raw_buffer_socket.cc:24] [C0] read returns: 86

And this value is the one expected by the CronetUrlRequestTest.

However, the test fails with this value: 210

carloseltuerto commented 2 years ago

That was a bug in the Java code - the returned value is indeed 86 :-)

carloseltuerto commented 2 years ago

Unfortunately, the values being returned are not timestamps. There is some kind or corruption / bad translation. Examples of values found in a single instance of EnvoyFinalStreamIntel:

dnsStart:     140000050487350
dnsEnd:       4
connectStart: 140190016975808
connectEnd:   140190016975792
requestStart: 3691316927787970592
carloseltuerto commented 2 years ago

After some more digging, here are some observations:

alyssawilk commented 2 years ago

have a PR out to fix the memset. are we sure (say from logs) that we're sending the full request before we get a response? For error responses that's not always true.