TheRosettaFoundation / SOLAS-Match

Self-managed translation project interface
www.TheRosettaFoundation.org
GNU Lesser General Public License v3.0
12 stars 8 forks source link

Task file not updated after re-upload #1253

Closed Paulina-Rosetta closed 7 years ago

Paulina-Rosetta commented 7 years ago

I have re-uploaded the trasnlation of this task several times but when I download the file again, I still get the old version: https://trommons.org/task/19108/view/

Everything seems to be working fine but the file doesn't update. Help!

alanbarrett commented 7 years ago

https://trommons.org/org/19/task/19108/complete/ ... Paulina (using Firefox) gets version-1, whereas I get version-3 as expected (using Chrome). Paulina also gets version-3 using Chrome. She cleared Cookies and now gets the correct version (using Firefox).

At first when going through this with Paulina, I thought this was a Firefox bug with caching. Now I see that Firefox might be doing what is requested of it by our caching instructions. In any case it makes sense to turn off caching for file downloads as a file is likely to be downloaded by only a single user and this will fix the bug I would assume. Also our caching instructions may be inconsistent.

We do the following $headerArray['Content-type'] = $mime; $headerArray['Content-Disposition'] = "attachment; filename=\"".trim($path_parts["basename"],'"')."\""; $headerArray['Content-length'] = $fsize; $headerArray['X-Frame-Options'] = "ALLOWALL"; $headerArray['Pragma'] = "public"; $headerArray['Cache-control'] = "private"; //See http://goo.gl/3fdIVm $headerArray['X-Sendfile'] = realpath($absoluteFilePath);

Relevant Links... http://stackoverflow.com/questions/1920781/what-does-the-http-header-pragma-public-mean According to the standard, Pragma is implementation dependent (section 14.32), except for no-cache because of its wide use. Cache-Control (section 14.9) is the proper way to control caching. This is what the standard says for a Cache-Control: public: Indicates that the response MAY be cached by any cache, even if it would normally be non-cacheable or cacheable only within a non-shared cache.

http://stackoverflow.com/questions/12908766/what-is-cache-control-private Cache-Control: private Indicates that all or part of the response message is intended for a single user and MUST NOT be cached by a shared cache, such as a proxy server.

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/X-Frame-Options There are three possible directives for X-Frame-Options: X-Frame-Options: DENY X-Frame-Options: SAMEORIGIN X-Frame-Options: ALLOW-FROM https://example.com/

http://stackoverflow.com/questions/80186/using-x-sendfile-with-apache-php http://blog.jasny.net/articles/how-i-php-x-sendfile/

Moodle (for 0 lifetime) does something like this... header('Content-Disposition: attachment; filename="'.$filename.'"'); header('Cache-Control: private, must-revalidate, pre-check=0, post-check=0, max-age=0, no-transform'); header('Expires: '. gmdate('D, d M Y H:i:s', 0) .' GMT'); header('Pragma: no-cache'); header('Content-Type: '.$mimetype); header('Last-Modified: '. gmdate('D, d M Y H:i:s', time()) .' GMT'); header('Accept-Ranges: none'); header("$CFG->xsendfile: $filepath");

We will need to decide on the exact solution.

Alan.
alanbarrett commented 7 years ago

Some points after quite a bit of investigation...

1) There was a problem with the environment at the Rosetta Foundation office. Not sure why, because for me, Firefox V50.0 (as well as Chrome) always re-downloads the file. Maybe there is a public cache at the office which held the file? But not evident why clearing the Cookies etc. worked (I assume The Firefox is up to date).

2) In any case the "Pragma" header has "public" which is not well supported.

3) "Cache-Control" header has "private" even though no "max-age" is specified.

4) Note, none of the browsers seem to be using the "Etag" mechanism to revalidate.

5) So the next question is to work out if there an advantage to caching (and to which resources), if so we should configure correctly.

6) Otherwise disable caching.

Here is a capture of the response headers for https://trommons.org/task/19108/download-task-latest-file/ which downloads the file... Cache-Control:private Connection:Keep-Alive Content-Disposition:attachment; filename="Proyectos Hope Guatemala enviado 22.09.2016.docx" Content-Length:283162 Content-Type:application/vnd.openxmlformats-officedocument.wordprocessingml.document Date:Sat, 19 Nov 2016 14:04:40 GMT ETag:"4521a-541906fa0926b;5405361d01fad" Keep-Alive:timeout=5, max=100 Last-Modified:Fri, 18 Nov 2016 10:04:47 GMT Pragma:public Server:Apache/2.4.7 (Ubuntu) Set-Cookie:slim_session= ... X-Frame-Options:ALLOWALL X-Powered-By:PHP/5.5.9-1ubuntu4.20

@aquilax any comments?

Alan.
alanbarrett commented 7 years ago

I am going to use... Cache-Control: no-cache, must-revalidate, no-transform Pragma: no-cache

Because we should not have caching with any expiry time because in theory the files or images could change at any time. "no-cache" does allow the use of the Etag by the browser to re-validate with a server that a file has not changed so avoiding a re-download.

Alan.
aquilax commented 7 years ago

Sorry for the late rely @alanbarrett

I don't think we can rely on ETag for dynamically generated content. Safest bet IMO for these files will be no caching at all, as you suggested.

alanbarrett commented 7 years ago

@aquilax , Yes ETag should not be used for dynamic content (and it is not). However it is valid to use it for files (project, task and image files).

I now see my comment that "ETag" was not been used was incorrect, because I had caching turned off when using Chrome's developer tools! It is being using correctly to validate that the browser has the most up to date version of a file using the "If-Range" header and a request for one byte. If that match fails then the server sends the whole file instead of the one byte.

In addition I do see that sometimes the Chrome (and I assume Firefox) browser satisfies the request from it own cache (because of the cache settings that I have not yet pulled to the Trommons live server). So it could actually be an out of date file (i.e. Paulina's bug).

Alan.
alanbarrett commented 7 years ago

@aquilax , @Paulina-Rosetta ,

This is now resolved. Although browsers with un-cleared caches may have an "historical" problem with some files.

I tested by clicking download on page: https://trommons.org/org/19/task/18888/complete/ The download causes a GET to https://trommons.org/task/18888/download-task-latest-file/ Now the first time a GET request is made for the file download the Chrome sees a response including... HTTP/1.1 200 OK Content-Disposition: attachment; filename="Proyectos Hope Guatemala enviado 22.09.2016.docx" Pragma: no-cache Cache-Control: no-cache, must-revalidate, no-transform Last-Modified: Thu, 03 Nov 2016 22:34:13 GMT ETag: "9e11b-5406d2838ecda;541baea2030a0"

Second time (same file), Chrome sends headers including... If-Range:"9e11b-5406d2838ecda;541baea2030a0" Range:bytes=16384-16384

and gets a response including... HTTP/1.1 206 Partial Content Content-Disposition: attachment; filename="Proyectos Hope Guatemala enviado 22.09.2016.docx" Pragma: no-cache Cache-Control: no-cache, must-revalidate, no-transform ETag: "9e11b-5406d2838ecda;541baea2030a0" Content-Length: 1 Content-Range: bytes 16384-16384/647451

Third and subsequent times, Chrome sends headers including... If-Modified-Since:Thu, 03 Nov 2016 22:34:13 GMT If-None-Match:"9e11b-5406d2838ecda;541baea2030a0"

and gets a response including... HTTP/1.1 304 Not Modified ETag: "9e11b-5406d2838ecda;541baea2030a0" Content-Disposition: attachment; filename="Proyectos Hope Guatemala enviado 22.09.2016.docx"

The above shows that Chrome revalidates its cache each time!

Also, just a note that it is impossible to test this fix unless caching is off in Chrome Developer Tools (Network Headers). It is also impossible to test on the development server which has a non matching certificate which causes Chrome to turn off caching also!

Alan.