carlwgeorge / repomd

Python library for parsing dnf/yum repositories
MIT License
18 stars 14 forks source link

add support for xz compressed repodata #17

Open keelerad opened 4 days ago

keelerad commented 4 days ago

Have been using your module for checking for updates in RHEL and epel repos and have found it really useful

But today I got this when run against an epel 8 mirror, below an example to reproduce issue

>>> import repomd
>>> this_repo_path="https://d2lzkl7pfhq30w.cloudfront.net/pub/epel/8/Everything/x86_64/"
>>> repo = repomd.load(this_repo_path)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/nexor/release/virtualenvs/conan-rhel8/lib64/python3.6/site-packages/repomd.py", line 39, in load
    metadata = defusedxml.lxml.fromstring(uncompressed.read())
  File "/usr/lib64/python3.6/gzip.py", line 276, in read
    return self._buffer.read(size)
  File "/usr/lib64/python3.6/gzip.py", line 463, in read
    if not self._read_gzip_header():
  File "/usr/lib64/python3.6/gzip.py", line 411, in _read_gzip_header
    raise OSError('Not a gzipped file (%r)' % magic)
OSError: Not a gzipped file (b'\xfd7')

It had just successfully read the rhel 8 AppStreams and BaseOS repos but gets that failure on epel 8

Has there been some update to the way they package repodata thats breaking your module?

Just noticed that the epel8 repo has .xz files in its repodata rather than .gz that the rhel repos use which is why your module is having problems

-rw-r--r--. 1 root root  2438476 Jul  5 09:10 17303968ee78f94990aa4e02ef531b28251553f793c987990024f70e19a78f43-primary.xml.xz
-rw-r--r--. 1 root root  1134339 Jul  5 09:10 2c7fdb21e5a08f8afa7c7accb1fec9d38d98d483d46138802d671c57c379dac7-updateinfo.xml.bz2
-rw-r--r--. 1 root root    25704 Jul  5 09:10 783ce06abb85cc09757a9ffcb069277589dd60ee77ea194582cb6bec1f3122ea-comps-Everything.x86_64.xml.xz
-rw-r--r--. 1 root root 10762172 Jul  5 09:10 97275104e72f8cdfbc4e232d63a6edf45da8c70a22c22004779329c11a34dd90-filelists.xml.xz
-rw-r--r--. 1 root root 11200972 Jul  5 09:10 c0a2751531cc6b3554c186a77ea1b829ff4984ed52f6762e374f39170f961675-filelists.sqlite.xz
-rw-r--r--. 1 root root  1870828 Jul  5 09:10 cbfac394d5aab0de2479333cd872d4bf5bafea119ff6f9c4c904e110a7493108-other.sqlite.xz
-rw-r--r--. 1 root root  4656408 Jul  5 09:10 d6cefefbf0a58809acea4b62d660f54afbf3331239c0f4bf5e016d977ca19d4a-primary.sqlite.xz
-rw-r--r--. 1 root root     1240 Jul  5 09:10 f400b281aafab2de9e1e5ef9e53e9cb7f759af4e2806df940a9562c142144510-prestodelta.xml.xz
-rw-r--r--. 1 root root  1117344 Jul  5 09:10 fe90c5141099711570285dc0660968fba0f8bd9deaac3f140d1d3d19a476236b-other.xml.xz
-rw-r--r--. 1 root root     4521 Jul  5 09:10 repomd.xml

Just checked the epel 7 and epel 9 repos and they both use .gz for the repodata, its only the epel 8 repo that seems to be using .xz

Root cause discovered here https://pagure.io/releng/issue/12097 they broke it 2 months back by moving to .zst format from .gz, then rather than revert back to .gz they decided to change to using .xz

carlwgeorge commented 4 days ago

You're spot on regarding the cause of the issue. When I originally wrote this library, gz repodata was the dominate standard and so that was the only compression format I wrote code to handle. I knew that wouldn't be the case forever, but never got around to adding code to handle other compression formats. At some point I'll add code to handle xz and probably even zst. I don't know when that will be, I'm long overdue to improve this library and respond to open pull requests here, I just haven't made time for it yet.