arkivverket / noark5-tjenestegrensesnitt-standard

6 stars 11 forks source link

Misleading "large file upload" examples and specification #313

Closed ivaylomitrev closed 1 year ago

ivaylomitrev commented 1 year ago

       Prosjekt  NOARK 5 Tjenestegresesnitt
       Kategori  Noark 5.5.0 TG versjon 1.0
    Alvorlighet  kommentar / protest
   Meldingstype  utelatt / trenger klargjøring
Brukerreferanse  user@example.com
    Dokumentdel  #6

Beskrivelse

The "large file upload" section in Chapter 6 seems to have incorrect calculations making for ambiguous/misleading interpretation of the specification.

Namely, the "first chunk" upload lists the following headers in the request:

Content-Length: 524288
Content-Type: image/jpeg
Content-Range: bytes 0-524287/2000000

and the following Range header in the response:

Range: bytes 0-524287

Subsequently, the "last chunk" upload lists the following headers in the request:

Content-Length: 427136
Content-Type: image/jpeg
Content-Range: bytes 1572864-2000000/2000000

These clash with each other and I am unsure how to read the requirements. According to the Google Drive documentation (which this specification identifies as inspiration), the range is expected to be inclusive and the maximum value of the upper range must be "X-Upload-Content-Length - 1", meaning that the Content-Length in the "first chunk" should be 524288 (correct) and the Content-Length of the last chunk in the examples should be 427136 (correct), but the Content-Range should be "1572864-1999999/2000000" (instead of "1572864-2000000/2000000"). See Resume an interrupted upload.

Another point of confusion is the returned Range header from the server. According to the specification, the upper value of the returned range is used as a starting value of the Content-Range header in the next transmission. This, however, clashes with the implied requirement that the lower (and upper, for that matter) value of the range should be inclusive and also clashes with the Google Drive implementation of resumable upload which clearly states that one must ensure that "... that object data you're about to upload begins at the byte following the upper value in the Range header" (emphasis mine).

Ønsket endring

  1. Change the Content-Range in the "last chunk" upload in chapter 6 to 1572864-1999999/2000000 (from 1572864-2000000/2000000)
  2. Specify that the upper value of the returned Range header plus one must be used as the starting value of the Content-Range header in the next transmission (exactly as Google Drive does)
petterreinholdtsen commented 1 year ago

[ivaylomitrev]

The "large file upload" section in Chapter 6 seems to have incorrect calculations making for ambiguous/misleading interpretation of the specification.

I agree, I believe you are right.

  1. Change the Content-Range in the "last chunk" upload in chapter 6 to 1572864-1999999/2000000 (from 1572864-2000000/2000000)
  2. Specify that the upper value of the returned Range header plus one must be used as the starting value of the Content-Range header in the next transmission (exactly as Google Drive does)

I'll prepare a patch for this, unless you beat me to it. :)

-- Happy hacking Petter Reinholdtsen

ivaylomitrev commented 1 year ago

Thanks! I was just waiting for a confirmation about my research. I opened https://github.com/arkivverket/noark5-tjenestegrensesnitt-standard/pull/314 as a result.

petterreinholdtsen commented 1 year ago

[Ivaylo Mitrev]

Thanks! I was just waiting for a confirmation about my research. I opened https://github.com/arkivverket/noark5-tjenestegrensesnitt-standard/pull/314 as a result.

Very good. Just to make sure we understand the specification the same way, and as a draft for a better example in the specification, here is an example of use of headers, by uploading 29 bytes using the "large file" procedure, ten bytes per request.

In a real upload, ten bytes would not be uploaded like this, as it would be handled as one small upload.

The marker -> indicates the POST request content/header, and <- indicate the POST response content/header.

POST http://site/href/of/fil/rel -> Content-Length: 0 X-Upload-Content-Length: 29 <- Location: http://site/block1url

PUT http://site/block1url -> Content-Length: 10 Content-Range: bytes 0-9/29

<10 bytes> <- Location: http://site/block2url Range: bytes 0-9 PUT http://site/block2url -> Content-Length: 10 Content-Range: bytes 10-19/20 <10 bytes> <- Location: http://site/block3url Range: bytes 0-19 PUT http://site/block3url -> Content-Length: 9 Content-Range: bytes 20-28/29 <9 bytes> <- { json 'dokumentobjekt' content } Does this reflect how you understand the spesification too? -- Happy hacking Petter Reinholdtsen
ivaylomitrev commented 1 year ago

That is how I understand it, yes.

Two minor things:

  1. I believe you have a typo in your second request. You have specified a bytes 10-19/20 Content-Range when it should be bytes 10-19/29.
  2. Header X-Upload-Content-Type is missing in your first request.