OPENDAP / dap4-specification

0 stars 0 forks source link

Proposal to support optional checksumming #1

Open DennisHeimbigner opened 2 years ago

DennisHeimbigner commented 2 years ago

The spec allows optional checksumming of variable data but as far as I can tell, we never provided a way for a client to tell a server if checksumming should be used on a request.

So I propose the following change to Volume 2, section 5.1 (Query String Parameters) to add the following new query parameter "dap4.checksum".

The possible cases are defined as follows:

  1. "dap4.checksum=true" -- add checksums
  2. "dap4.checksum=false" -- suppress checksums
  3. "dap4.checksum=" -- same as "dap4.checksum=true"
  4. "dap4.checksum" -- same as "dap4.checksum=true"
  5. missing (i.e. no "dap4.checksum" specified) -- use the default.

Nathan noted:

"...I think we should also allow for servers that ALWAYS include checksums, i.e. even if the client does not request the checksums the presence of the checksums in the response should not break the clients handling of the response...

I assume that there is no hope for fixing these servers. So I can see three ways of handling this:

  1. A short term solution is to make the default be "dap4.checksum=true" when no dap4.checksum query is present (case 5 above). This will work until such time as clients start including specific dap4.checksum query parameters. At that point, accessing such a server will fail if the user specifies checksums are off.

  2. We can identify the relevant servers and stick a hack in our libraries to force "dap4.checksum=true" when requests are sent to those servers. The DAP2 code we have already does things like this for non-compliant servers like Columbia. Its ugly, but it works. [Nathan, can you create a list of such servers?]

  3. The client always computes the checksum as it reads the variable's data off the socket stream. Then it needs to see if there is an extra four bytes of data at the end of the variable's data. I do not know if this is unambiguously possible. And of course, it is a performance hit.

jgallagher59701 commented 2 years ago

As a slight change to #3, the client is lenient in that it can accept checksums, but just ignores them. Since the checksum value appears in the DMR, I think that's pretty straightforward. Then, the compatibility issues with old servers that always include checksums goes away and there's no overhead for clients that don't care about the checksums (not cost associated with the CRC computation).

NB: The notion for checksums is not data integrity for the transmission. It's for determining if the data values changed over time (which they do, in some cases, as datasets are reprocessed).

ndp-opendap commented 2 years ago

☝️ That's what I was originally trying to express.

And, to be clear, I am in favor of Dennis' idea of adding a client submitted query parameter to instruct the server to produce the checksums.

jgallagher59701 commented 2 years ago

I, also, am in favor of Dennis' suggestion to use the query parameter.