anatol / pacoloco

Caching proxy server for Arch Linux pacman
MIT License
199 stars 30 forks source link

Download code refactoring (#7) #61

Closed Focshole closed 1 year ago

Focshole commented 1 year ago

This should fix #7 and #30, possibly #58 too . I have rewritten some code parts in handleRequest and fixed some race conditions which could happen because of how mutexes were being used. Now if pacoloco shouldn't handle the request (because the file is partial and it is being downloaded), it replies with a 302 to the upstream mirror, so that it won't break the download to other clients (and then it would allow partial downloads for cached packages). Also, this PR should reduce timeout waiting by trying a HEAD request before trying to download a file which is not available upstream. It includes also some formatting fixes in the readme, as well as with some improved examples.

Focshole commented 1 year ago

I have forced the Accept-Ranges header, and if a Range is specified and there is no local file, it redirects the request upstream. If someone will be interested in implementing the partial file download, all it is required is to change this line and this line with the proper implementation.

It would be great to add extra some tests for those functionalities but I have no time to do so, sorry.

Jabbermuggel commented 1 year ago

I haven't tested this but it looks like quite an improvement over the existing code. And as you mentioned in #7, it should be relatively easy to improve this further to implement the byte range serving, so I'm basing an implementation of this on this version.

Focshole commented 1 year ago

I had removed the readme conflicts, hopefully in a proper way

Focshole commented 1 year ago

Ok, I have fixed an issue that would happen only if repositories url were not specified (it is probably impossible to happen through config but it would happen through tests)

Focshole commented 1 year ago

It seems that there is an issue when the signature is not available (see https://github.com/anatol/pacoloco/issues/63#issuecomment-1445159711 ), I cannot track it down

Jabbermuggel commented 1 year ago

As far as I can tell there have been no changes to the download functions except the return parameters, which shouldn't really change things here. I believe the error may lie in the logic of the checkUpstreamFileAvailability function. In line 245 of pacoloco.go, we have the code

        shouldBeDownloaded, err := checkUpstreamFileAvailability(repo.URL+path+"/"+fileName, ifLater)
        if shouldBeDownloaded {
            err = downloadFile(repo.URL+path+"/"+fileName, filePath, ifLater)

Thus, the downloadFile function gets called if the checkUpstreamFileAcailability function returns true. however, if we look there, we have the following code snippet:

    if err == nil {
        defer resp.Body.Close()
        switch resp.StatusCode {
        case http.StatusOK:
            return true, nil
        case http.StatusMethodNotAllowed:
            // HEAD method is not allowed, assume that the file should be downloaded
            return true, nil
        case http.StatusNotModified:
            // either pacoloco or client has the latest version, no need to redownload it
            return false, nil
        default:
            // Assume that the file should be downloaded but the mirror is not available
            return true, err
        }
    }

I think the problem here is that there is no case for http code 404 (in go http.StatusNotFound), which is the return code for nonexistent files (e.g. https://geo.mirror.pkgbuild.com/core/os/x86_64/core.db.sig from issue #63). Thus, in this case the default is applied, returning true, and the handler tries to call downloadFile on an empty URL, which then results in the empty file being created and served.

Focshole commented 1 year ago

Sorry, I messed up things while trying to update!