anatol / pacoloco

Caching proxy server for Arch Linux pacman
MIT License
199 stars 30 forks source link

Cannot sync every repo's DBs #92

Closed afonsofrancof closed 4 months ago

afonsofrancof commented 10 months ago

Hello. I have been getting this error for quite some time. If I point my repos to pacoloco and then point pacoloco to my mirror list, only some of the DBs get downloaded. On the other DBs I get: pacoloco.go:168: repo archlinux has no urls

docker-compose.yml

---
version: "3.8"
services:
  pacoloco:
    container_name: pacoloco
    image: ghcr.io/anatol/pacoloco
    ports:
      - "9129:9129"
    volumes:
      - /pacoloco/cache:/var/cache/pacoloco
      - /pacoloco/pacoloco.yaml:/etc/pacoloco.yaml
      - /etc/pacman.d/reflector-mirrorlist:/etc/mirrorlist
    restart: unless-stopped
    environment:
      - TZ=Europe/Lisbon

/pacoloco/pacoloco.yaml (mapped to /etc/pacoloco.yaml inside the container)

port: 9129
cache_dir: /var/cache/pacoloco
purge_files_after: 360000
download_timeout: 3600
repos:
  archlinux:
    mirrorlist: /etc/mirrorlist
user_agent: Pacoloco/1.2
prefetch:
  cron: 0/30 * * * * 
  ttl_unaccessed_in_days: 30 
  ttl_unupdated_in_days: 300 

/etc/pacman.d/reflector-mirrorlist (mapped to /etc/mirrorlist inside the container)

Server = https://mirrors.celianvdb.fr/archlinux/$repo/os/$arch
Server = https://archlinux.mailtunnel.eu/$repo/os/$arch
Server = https://mirror.theo546.fr/archlinux/$repo/os/$arch
Server = https://mirror.ubrco.de/archlinux/$repo/os/$arch
Server = https://mirror.sunred.org/archlinux/$repo/os/$arch
Server = https://mirror.cyberbits.eu/archlinux/$repo/os/$arch
Server = https://packages.oth-regensburg.de/archlinux/$repo/os/$arch
Server = https://mirror.pseudoform.org/$repo/os/$arch
Server = https://de.arch.mirror.kescher.at/$repo/os/$arch
Server = https://arch.jensgutermuth.de/$repo/os/$arch
Server = https://arch.unixpeople.org/$repo/os/$arch
Server = https://mirror.iusearchbtw.nl/$repo/os/$arch
Server = https://mirror.f4st.host/archlinux/$repo/os/$arch
Server = https://mirrors.xtom.de/archlinux/$repo/os/$arch
Server = https://ftp.halifax.rwth-aachen.de/archlinux/$repo/os/$arch
Server = https://mirrors.janbruckner.de/archlinux/$repo/os/$arch
Server = https://mirror.cmt.de/archlinux/$repo/os/$arch
Server = https://mirrors.n-ix.net/archlinux/$repo/os/$arch
Server = https://arch.phinau.de/$repo/os/$arch
Server = https://archlinux.thaller.ws/$repo/os/$arch

/etc/pacman.d/mirrorlist Server = http://localhost:9129/repo/archlinux/$repo/os/$arch

/etc/pacman.conf (the important parts)

SigLevel    = Required DatabaseNever
LocalFileSigLevel = Optional

[core]
Include = /etc/pacman.d/mirrorlist

[extra]
Include = /etc/pacman.d/mirrorlist

[multilib]
Include = /etc/pacman.d/mirrorlist

Pacoloco logs (in this case core.db isn't being downloaded)

pacoloco.go:168: repo archlinux has no urls

downloader.go:102: downloading https://mirrors.celianvdb.fr/archlinux//extra/os/x86_64/extra.db

downloader.go:102: downloading https://mirrors.celianvdb.fr/archlinux//multilib/os/x86_64/multilib.db

pacoloco.go:270: serving cached file for archlinux/multilib/os/x86_64/multilib.db

As you can see, it says repo archlinux has no urls

Pacman logs

:: Synchronizing package databases...
 core.db failed to download
 extra is up to date
 multilib is up to date
 chaotic-aur is up to date
error: failed retrieving file 'core.db' from localhost:9129 : The requested URL returned error: 404
error: failed to synchronize all databases (failed to retrieve some files)

It all works if I specify individual repos for each arch repo inside the pacoloco config, like this:

repos:
  core:
    mirrorlist: /etc/mirrorlist
  extra:
    mirrorlist: /etc/mirrorlist
  multilib:
    mirrorlist: /etc/mirrorlist

and then change my /etc/pacman.conf to have this instead

[core]
Server = http://localhost:9129/repo/core/$repo/os/$arch

[extra]
Server = http://localhost:9129/repo/extra/$repo/os/$arch

[multilib]
Server = http://localhost:9129/repo/multilib/$repo/os/$arch

Thanks :)

afonsofrancof commented 10 months ago

Tried one more sync without changing my setup and now multilib isn't syncing.

downloader.go:336: repo archlinux has no urls
pacoloco.go:168: repo archlinux has no urls
downloader.go:102: downloading https://mirrors.celianvdb.fr/archlinux//extra/os/x86_64/extra.db
pacoloco.go:270: serving cached file for archlinux/extra/os/x86_64/extra.db
afonsofrancof commented 10 months ago

And you can notice that it is adding an extra slash after the url before $repo. downloader.go:102: downloading https://mirrors.celianvdb.fr/archlinux//extra/os/x86_64/extra.db

It reads archlinux// , instead of archlinux /

(Not that it affects the end result)

afonsofrancof commented 10 months ago

No changes in config and have this now, maybe related?

downloader.go:102: downloading https://mirror.f4st.host/archlinux//multilib/os/x86_64/multilib.db
downloader.go:68: unable to download file archlinux/multilib/os/x86_64/multilib.db: receiving file 
https://mirror.f4st.host/archlinux//multilib/os/x86_64/multilib.db: 
Content-Length is 146416 while received body length is 1903408
anatol commented 10 months ago

Okay it looks here are several issues reported.

pacoloco.go:168: repo archlinux has no urls

@Focshole had this issue before. @afonsofrancof could it be some race condition between pacoloco and reflector changing the file?

it reads archlinux// , instead of archlinux /

it should be fixed in master, could you please pull the recent changes and build pacoloco from sources?

Content-Length is 146416 while received body length is 1903408

This error is weird. Do you always see this issue or it just sporadic?

Focshole commented 10 months ago

I had that issue when pacoloco mirrorlist was unreadable or empty for some reason. Is that file readable from the container?

afonsofrancof commented 10 months ago

@Focshole Yes, the file is readable since some of the repos can sync.

afonsofrancof commented 10 months ago

@afonsofrancof could it be some race condition between pacoloco and reflector changing the file?

I don't have reflector running anymore, that was the name of the file because I had it running before. And it would make no sense since it works if I separate the repos.

This error is weird. Do you always see this issue or it just sporadic?

Only sometimes, I have no idea what conditions cause it.

Orochimarufan commented 8 months ago

I think I've run into the same issue. It seems to happen when the mirrorlist file is bind-mounted directly.

Bind-mounting the parent directory instead (/etc/pacman.d:/etc/pacman.d:ro instead of /etc/pacman.d/mirrorlist:/etc/pacman.d/mirrorlist:ro) appears to have fixed it for me.

No idea why pacoloco chokes on bind-mounted files though...

chennin commented 8 months ago

I'm having this problem without using a docker bind mount. Is there a debug log option / environment variable?

Also, how do I tell what version of pacoloco I am running? Besides inspecting the docker sha256 digest which isn't very friendly?

Repro below.

Version:

# docker inspect --format='{{index .RepoDigests 0}}' e5bcb0215eaa
ghcr.io/anatol/pacoloco@sha256:b93e352f8c4d34494df208158a05d5487da03d847467406ab35132197e9a2e9d

docker-compose.yaml:

  pacoloco:
    image: ghcr.io/anatol/pacoloco
    container_name: prod-pacoloco
    restart: unless-stopped
    user: "122:1999" # pacoloco:pacoloco
    environment:
      TZ: "UTC"
    ports:
      - "127.0.0.1:9129:9129"
    volumes:
      - 'pacoloco:/var/cache/pacoloco'
      - '/root/etc-docker/pacoloco/pacoloco.yaml:/etc/pacoloco.yaml:ro'
      - '/root/etc-docker/pacoloco:/data:ro'
      - '/etc/passwd:/etc/passwd:ro'
      - '/etc/group:/etc/group:ro'
      - '/etc/localtime:/etc/localtime:ro'
      - '/etc/timezone:/etc/timezone:ro'
    logging:
      <<: *logging

I have nginx proxying to pacoloco. That part works when pacoloco does.

pacoloco.yaml:

purge_files_after: 1814400 # 21 days
download_timeout: 3600 # download will timeout after 3600 seconds
repos:
#  archlinux:
#    urls:
#      - http://mirror.lty.me/archlinux
#      - http://mirrors.kernel.org/archlinux
  archlinux:
    mirrorlist: /data/mirrorlist
prefetch: # optional section, add it if you want to enable prefetching
  cron: 41 5 * * *
  ttl_unaccessed_in_days: 30  # defaults to 30, set it to a higher value than the number of consecutive days you don't update your systems
  # It deletes and stop prefetch packages(and db links) when not downloaded after ttl_unaccessed_in_days days that it had been updated.
  ttl_unupdated_in_days: 300 # defaults to 300, it deletes and stop prefetch packages which hadn't been either updated upstream or requested for ttl_unupdated_in_days.
set_timestamp_to_logs: true

pacoloco can read the file:

~# docker-compose exec pacoloco sh -c "id; tail -n1 /data/mirrorlist"
uid=122(pacoloco) gid=1999(pacoloco) groups=1999(pacoloco)
Server = https://mirror.rackspace.com/archlinux/$repo/os/$arch

Logs:

prod-pacoloco          | 2023/11/24 16:26:30 repo archlinux has no urls
prod-pacoloco          | 2023/11/24 16:26:30 repo archlinux has no urls
prod-pacoloco          | 2023/11/24 16:26:30 downloading https://mirrors.mit.edu/archlinux//extra/os/x86_64/extra.db
prod-pacoloco          | 2023/11/24 16:26:30 serving cached file for archlinux/extra/os/x86_64/extra.db
prod-pacoloco          | 2023/11/24 16:26:30 unable to download file archlinux/extra/os/x86_64/extra.db.sig

Client logs:

:: Synchronizing package databases...
 core.db failed to download
 extra is up to date
 multilib.db failed to download
error: failed retrieving file 'core.db' from arch.<server>.net : The requested URL returned error: 404
error: failed retrieving file 'multilib.db' from arch.<server>.net : The requested URL returned error: 404
error: failed to synchronize all databases (unexpected error)

Reproduction

For me this repro reliably prints errors, but which one fails (core or extra) is switches.

docker network create paco-test; \
docker stop pacoloco-test; \
docker rm pacoloco-test; \
mkdir -p /tmp/pactest && cd /tmp/pactest && \
{ cat >pacoloco.yaml <<EOF
repos:
  archlinux:
    mirrorlist: /data/pacoloco-mirrorlist
cache_dir: /tmp
EOF
} && \
{ cat >pacoloco-mirrorlist <<EOF
Server = https://mirrors.mit.edu/archlinux/\$repo/os/\$arch
Server = https://arch.mirror.constant.com/\$repo/os/\$arch
EOF
} && \
{ cat >arch-mirrorlist <<EOF
Server = http://pacoloco-test:9129/repo/archlinux/\$repo/os/\$arch
EOF
} && \
docker run -d --network paco-test --name pacoloco-test -v "$PWD:/data" -v "$PWD/pacoloco.yaml:/etc/pacoloco.yaml" ghcr.io/anatol/pacoloco@sha256:b93e352f8c4d34494df208158a05d5487da03d847467406ab35132197e9a2e9d && \
docker run --network paco-test --rm -it --name archlinux-test -v "$PWD/arch-mirrorlist:/etc/pacman.d/mirrorlist" archlinux:latest pacman -Syuv --noconfirm; \
echo -e  "\n^^ ARCH ^^\nvv PACOLOCO vv\n" && \
docker logs pacoloco-test

Repro cleanup:

docker stop archlinux-test pacoloco-test ; docker rm archlinux-test pacoloco-test; 
docker network rm paco-test; 
cd /tmp; rm -rf /tmp/pactest
chennin commented 7 months ago

Limiting pacoloco to 1 CPU seems to SOMETIMES solve the problem for me.

In my docker case that's docker run -d --cpuset-cpus=1 ... or

services:
  pacoloco:
    image: ghcr.io/anatol/pacoloco
    cpuset: "1"

Edited to add: It worked for a while, but now I am getting "has no urls" and a 404 on core.db, on a container that has been running with 1 cpu for a few days.

Focshole commented 1 month ago

Sorry for bumping this old thread, but I just got into this issue too. The issue is still present and it is with this section:

serving cached file for archlinux/multilib/os/x86_64/multilib.db

Pacoloco fetches an old .db file that somehow got left in cache. This has to be removed and not served anymore. I fixed it in my installation by removing .db files from cache. I guess it is something I forgot to cleanup after caching maybe!

krameler commented 1 month ago

Did you change the mirrorlist between requests of the .db file, or did you see any connection errors, or how is this related to the closed issue?

For .db-files pacoloco always connects to a mirror and does a "If-Modified-Since"-Check, so the file being in cache shouldn't be a problem.

anatol commented 1 month ago

@Focshole does this PR related to your issue by the chance https://github.com/anatol/pacoloco/pull/109 ?