divviup / janus

Experimental implementation of the Distributed Aggregation Protocol (DAP) specification.
Mozilla Public License 2.0
53 stars 15 forks source link

HTTP Status Code Divergence between DAP Specification and Janus Release 0.7 #3483

Closed MaeMilc closed 1 week ago

MaeMilc commented 1 week ago

When perusing the DAP specification of the Janus version under the release/0.7 branch, one comes across the following paragraph towards the end of section 4.4.2 regarding the possible responses a Leader can send to a Client that submitted a report earlier:

The Leader responds to well-formed requests with HTTP status code 201 Created. Malformed requests are handled as described in Section 3.2. Clients SHOULD NOT upload the same measurement value in more than one report if the Leader responds with HTTP status code 201 Created.

Anyhow, the Leader seems to intentionally respond with HTTP status code 200 OK in the corresponding HTTP handler function in Janus, despite this deviating from the aforementioned DAP specification on which the release/0.7 Janus branch is based. The code underlying the HTTP handler function referred to earlier is pasted below for reference:

/// API handler for the "/tasks/.../reports" PUT endpoint.
async fn upload<C: Clock>(
    conn: &mut Conn,
    (State(aggregator), BodyBytes(body)): (State<Arc<Aggregator<C>>>, BodyBytes),
) -> Result<Status, ArcError> {
    validate_content_type(conn, Report::MEDIA_TYPE).map_err(Arc::new)?;

    let task_id = parse_task_id(conn).map_err(Arc::new)?;
    conn.cancel_on_disconnect(aggregator.handle_upload(&task_id, &body))
        .await
        .ok_or(Arc::new(Error::ClientDisconnected))??;

    // Handle CORS, if the request header is present.
    if let Some(origin) = conn.request_headers().get(KnownHeaderName::Origin) {
        // Unconditionally allow CORS requests from all origins.
        let origin = origin.clone();
        conn.response_headers_mut()
            .insert(KnownHeaderName::AccessControlAllowOrigin, origin);
    }

    Ok(Status::Ok)
}

On a slightly different note, the fact that receiving HTTP status code 200 OK is tolerated in a different Divvi Up repository containing an implementation of a DAP Client in TypeScript might also imply that this behavior differing from the DAP specification is intentional. Again, the relevant HTTP handler referred to earlier is pasted below for reference:

/**
     Sends a pregenerated {@linkcode Report} to the leader aggregator.

     @param report The {@linkcode Report} to send.

     @throws {@linkcode DAPError} if the response is not Ok.
   */
  async sendReport(report: Report) {
    const body = report.encode();
    const leader = this.#leader;
    const id = this.#id.toString();
    const response = await this.#fetch(
      new URL(`tasks/${id}/reports`, leader.url).toString(),
      {
        method: "PUT",
        headers: { "Content-Type": CONTENT_TYPES.REPORT },
        body,
      },
    );

    if (!response.ok) {
      throw await DAPError.fromResponse(response, "report upload failed");
    }
  }

Ultimately, I don't know if the behavior of Janus outlined in this issue was meant to be as such or not. In case it was not and I am not misinterpreting things, I hope that this issue might be a useful starting point for harmonizing this operational aspect of the Janus release 0.7 with its underlying DAP specification.

branlwyd commented 1 week ago

I can confirm this behavior is off-spec. It looks like DAP-03 (quite old by now) used 200 OK in this scenario; DAP-04 switched to 201 Created. This was evidently never picked up by Janus.

I think we will certainly fix this in current main (in-development implementation of DAP-13). We need to decide what to do here for older versions, such as the DAP-09 / 0.7.x release track; the behavior is clearly off-spec but fixing it may break existing clients who depend on the incorrect behavior.

While we discuss, I would recommend as a workaround interpreting any HTTP 2xx status code as "success".

branlwyd commented 1 week ago

I surveyed the following clients to determine their behavior w.r.t. the upload response status code:

branlwyd commented 1 week ago

Ah, one additional client:

Unfortunately, due to this, I think the 0.7 branch should not be fixed. Instead, I will file errata. The suggested workaround is to accept any response in the 2xx range as success.

branlwyd commented 1 week ago

I am closing this as there is no additional action to be taken at the moment; however, feel free to add any followup discussion on this issue.