Open armijnhemel opened 1 year ago
@armijnhemel you have eagle yes! thanks for the report. I do not have yet a good mostly universal solution on how to deal with these cases where multiple download URLs exist for a single package, like you found where we have patches and sources into a binary
The point is that for now the model is to have one download URL == one record in the purldb We can however track multiple purls for the related source packages though we do not have the proper DB models and relationship yet
The point is that for now the model is to have one download URL == one record in the purldb We can however track multiple purls for the related source packages though we do not have the proper DB models and relationship yet
Having thought a bit about this there are some other issues as well, which can possibly interfere (not in this particular case, but in general).
First of all, there is the situation where there are multiple files/download URLs that point to the same package. For example, let's look at GNU binutils: https://ftp.gnu.org/gnu/binutils/
For 2.30
there are four distinct downloads: a .tar.bz2
, a .tar.gz
, a .tar.lz
and a .tar.xz
. These are all equivalent and should map to the same package URL and possibly back as well.
Then there is the situation where multiple components/sources are used in a certain configuration (like in the Debian example). So what I could envision is that download_url
for a version would be something like this:
download_url = [
[url1, patch1, patch2],
[url2, patch1, patch2]
]
Or something like that.
Some more thoughts: Debian typically renames the original files (to something like foo_bar-1.0.orig.tar.gz
if the original is called foo_bar-1.0.tar.gz
). It also lowercases the files and replaces -
with _
.
A question: when encountering these (without patches or other files, just standalone), should they be mapped to the original package or to the Debian package? There is something to say for both.
Not sure if this should go here or another repository, so feel free to move.
I just looked at
deb-purls-aa.json.zst
and saw this line:The package number and the referenced source code file do not match: the file in
download_url
is the original file and is actually the same for multiple patch versions. The version number only becomes-4
after applying the Debian specific patches, so these should probably also be included. The patches for-4
are no longer available via the Debian FTP, but for-5
they are.The
.dsc
file for-5
says:So possibly you should not have this as a single download URL, but as a list of download URLs.
Also, with Debian these URLs tend to get moved (granted, after many years) to their archive. It might be good to take a closer look at https://github.com/nexB/fetchcode/issues/82