Open igagis opened 9 months ago
I see no big interest to this question...
I'm at the moment writing a freight
replacement in C++. So far looks very promising, performance is uncomparably better.
It will support same directory structure as freight
, so it will be possible to use it with same old repo.
https://github.com/igagis/aptian
First release is ready. So far implements only basic features which I need.
It uses about same directory structure as freight
, but simpler. Doesn't need lib
directory. I haven't tested it with true freight
repo though.
So far limitations:
GPG
key without password protectionPerformance is uncomparably higher. It takes about a second to add a new package to the repo, while freight
does it in 10+ minutes on my server.
@igagis I've experience performance issues with freight in the past. It seems that performance degrades rapidly (N^2?) as the number of packages in the repository increases. I think there's an underlying problem in the algorithm that freight uses. I did not really look into it in detail what is happening though.
I also experienced rapid degradation of performance with number of packages growing.
My understanding is that when adding a new package freight
puts it to the lib
directory, then it rebuilds all the Packages
indexes. I.e. it goes through each file in lib
, extracts its control
file, calculates hash sums and appends the info to a newly created Packages
file(s).
It seems that there is some caching mechanism there, but it doesn't look like freight
actually uses it, or it gives little effect.
Anyway, now I have my own solution (aptian) which has no performance problems in context of my tasks.
Sorry to say that, but I abandoned using freight
now.
Small update.
I actually tried to use aptian
with existing freight
repo, it works. Of course one needs to do aptian init
on the repo before staring to use it.
I think it is known that
freight
's performance is not perfect. For example in my repo it takes about 20 minutes to add a package to the repo. I must say that my server is quite weak, but still, it is uncomparable with binary tools, likerepo-add
forpacman
, which happens instantly on same hardware.I'm wondering, is it known what parts of freight are bottlenecks? Was there any such investigation? Could the problem be solved by writing some parts of
freight
in C++? What are those parts?I read the code a bit, not very deep. But, as far as I could understand, for adding a package it is needed to regenerate the whole repo. I.e. it is needed to unpack each package from
lib
, readcontrol
file and then create a newPackages
andRelease
files. To me it looks like so many unncessary actions for the sake of adding just one or several new packages. Would it make more sense to parse existing Packages, compare with list of files fromlib
and append only new packages?Maybe I misunderstand how
freight
andapt
repo works at the moment. Just trying to understand if there is a room for improvement.Thanks!