freight-team / freight

A modern take on the Debian archive.
Other
107 stars 37 forks source link

Performance #140

Open igagis opened 9 months ago

igagis commented 9 months ago

I think it is known that freight's performance is not perfect. For example in my repo it takes about 20 minutes to add a package to the repo. I must say that my server is quite weak, but still, it is uncomparable with binary tools, like repo-add for pacman, which happens instantly on same hardware.

I'm wondering, is it known what parts of freight are bottlenecks? Was there any such investigation? Could the problem be solved by writing some parts of freight in C++? What are those parts?

I read the code a bit, not very deep. But, as far as I could understand, for adding a package it is needed to regenerate the whole repo. I.e. it is needed to unpack each package from lib, read control file and then create a new Packages and Release files. To me it looks like so many unncessary actions for the sake of adding just one or several new packages. Would it make more sense to parse existing Packages, compare with list of files from lib and append only new packages?

Maybe I misunderstand how freight and apt repo works at the moment. Just trying to understand if there is a room for improvement.

Thanks!

igagis commented 8 months ago

I see no big interest to this question...

I'm at the moment writing a freight replacement in C++. So far looks very promising, performance is uncomparably better. It will support same directory structure as freight, so it will be possible to use it with same old repo.

igagis commented 8 months ago

https://github.com/igagis/aptian

First release is ready. So far implements only basic features which I need.

It uses about same directory structure as freight, but simpler. Doesn't need lib directory. I haven't tested it with true freight repo though.

So far limitations:

Performance is uncomparably higher. It takes about a second to add a new package to the repo, while freight does it in 10+ minutes on my server.

mattock commented 8 months ago

@igagis I've experience performance issues with freight in the past. It seems that performance degrades rapidly (N^2?) as the number of packages in the repository increases. I think there's an underlying problem in the algorithm that freight uses. I did not really look into it in detail what is happening though.

igagis commented 8 months ago

I also experienced rapid degradation of performance with number of packages growing.

My understanding is that when adding a new package freight puts it to the lib directory, then it rebuilds all the Packages indexes. I.e. it goes through each file in lib, extracts its control file, calculates hash sums and appends the info to a newly created Packages file(s).

It seems that there is some caching mechanism there, but it doesn't look like freight actually uses it, or it gives little effect.

Anyway, now I have my own solution (aptian) which has no performance problems in context of my tasks.

Sorry to say that, but I abandoned using freight now.

igagis commented 7 months ago

Small update.

I actually tried to use aptian with existing freight repo, it works. Of course one needs to do aptian init on the repo before staring to use it.