SUSE / rmt

RPM repository mirroring tool and registration proxy for SUSE Customer Center.
Other
37 stars 46 forks source link

Unable to mirror openSUSE Tumbleweed #1050

Closed dirkmueller closed 5 months ago

dirkmueller commented 7 months ago

As part of SUSE Hack Week 23, openSUSE Tumbleweed switched to zstd for metadata compression. The underlying repomd-parser is not able to handle those files however, as it only expects .gz. With that missing, the mirroring of packages is failing.

see https://github.com/ikapelyukhin/repomd-parser/issues/12 for more information

dirkmueller commented 7 months ago

Rather than parsing repomd in a ruby module that is maybe less well maintained, have you thought about using libsolv instead? libsolv has support for parsing this highly efficiently and wouldn't have such compatibility issues.

repomd2solv path/to/repo | dumpsolv and parsing that might be significantly more reliable.

felixsch commented 7 months ago

Thanks for letting us know Dirk.

As I understand both libsolv and nokogiri uses libxml2 as backend library. But performance for parsing file lists is not really an issue in RMT anyway, since most of the time is spend with downloading packages.

I looked into porting RMT to libsolv and came to the conclusion, that with the current state of the libsolv ruby bindings, the lacking documentation on libsolv itself (at least I failed to find a API documentation and had to browse the source) and the general different focus of the project (SAT dependency solving vs. brainless mirroring of files), Ivan's repomd-parser is just more easy to use. Not to mention that the SWIG generation is far from optimal.

Why would you think that parsing the meta information first into .solv dumps is more reliable than parsing the XML directly via libxml2/nokogiri given RMTs use case of just iterating over all referenced files? Maybe I'm missing the point :)

The conclusion might change if libsolv would ship with Debian repository support enabled but I could not find any information on the maintenance status for this code paths in libsolv. Do you have insights? Sadly it is disabled in current openSUSE/SUSE distributions.

From a maintenance perspective, Ivan (former SUSE employee and author of RMT) is usually pretty fast in reacting and if the time comes that the project is abandoned, I see no problem to fork and maintain the project within the SCC realms in the future.

doccaz commented 7 months ago

+1 for adding zstd support to RMT. I have a customer that is using RMT to mirror Tumbleweed as a custom repository for their developers, and it stopped working last week. I just found out about this issue here. Should I open a formal case for the customer?

doccaz commented 7 months ago

In case someone else needs this, this is the script I'm using on the RMT server itself to mirror the Tumbleweed repositories until this is solved.

#!/bin/bash
# Workaround script to download Tumbleweed OSS repository
# Basically, RMT does not support the recent move to ZSTD repodata, so it doesn't download the packages.
# https://github.com/SUSE/rmt/issues/1050
# 
# Erico Mendonca <erico.mendonca@suse.com>
#

trap cleanup SIGINT SIGTERM

cleanup() {
    echo "killing all wget instances, please wait..."
    killall -9 wget
    echo "Download stopped."
    exit 1
}

MAINURL="https://download.opensuse.org/tumbleweed/repo/oss"
ARCHS="repodata i586 i686 x86_64 noarch"
REPODIR="/var/lib/rmt/public/repo/tumbleweed/repo/oss"

# stop running wgets, if any
killall wget

# note: I'm including repodata just for the sake of completion. RMT (still) downloads the zstd repodata correctly, just doesn't parse it.
for f in ${ARCHS}; do
    echo "---> Mirroring ${f}..."
    mkdir -p ${REPODIR}/${f}
    cd ${REPODIR}/${f}
    screen -dmS download-${f} wget -c -m -np -nH --cut-dirs=4 --reject '*.mirrorlist' ${MAINURL}/${f}/
    cd ..
done

# wait for everything to finish...
while [ $(screen -list | grep -c download-) -gt 0 ]; do
    echo "Waiting for downloads to finish... (next try: $( date -d "+10 min"))"
    screen -list | grep download-
    sleep 600
done

# cleanup the indexes
echo "Changing permissions on ${REPODIR}..."
chown _rmt:nginx ${REPODIR} -R
find ${REPODIR} -name index.html -delete
find ${REPODIR} -name robots.txt -delete
cd -

echo "---> Done."

Just place it into /etc/cron.daily and it should do the job. It's not the fastest way to do this (RMT is way faster), but at least it's downloading the directories in parallel.

doccaz commented 6 months ago

Any news on this issue, @dirkmueller ?

ngetahun commented 6 months ago

@doccaz We're working on this at the moment https://github.com/ikapelyukhin/repomd-parser/pull/13

ikapelyukhin commented 5 months ago

I've released repomd-parser v0.1.6 to Rubygems. Please make sure that RMT's RPM package has zstd library as a dependency.

felixsch commented 5 months ago

This will released in the next upcoming RMT release 2.15