Metacello / metacello

Metacello is a package management system for Smalltalk
MIT License
87 stars 43 forks source link

Metacello re-fetches baselines even if they were fetched before #539

Open syrel opened 3 years ago

syrel commented 3 years ago

Hello šŸ‘‹

Recently, we decided to refactor baselines in a lower part of our decently sized project. That refactoring included splitting baselines that declare a lot of packages into multiple baselines that clearly specify dependency of a smaller set of packages by referencing other baselines from other repositories. This resulted in a significantly increased loading times, in particularly the fetch step that creates a linear list of loading directives. Upon closer inspection it turned out that dependent baselines are analysed over and over again even if metacello supposedly visited them already.

Project structure

To simplify the debugging process we recreated our project structure in a playground organization https://github.com/bugginrack.

In that project we have a bunch of libraries (https://github.com/bugginrack/MyLibrary) with the following baseline dependencies:

Dependencies-MyLybraryD

On top of that there is a framework (https://github.com/bugginrack/MyFramework):

Dependencies-MyFramework

Next, we have a few projects (https://github.com/bugginrack/MyProject), some of them depend on each other (A, B and C):

Dependencies-MyProject

Code size independency

Our thesis is that the size of the code-base does not have a significant influence on the fetching performance. To prove that we loaded the same baseline structure with a large (generated) code base (video) and without any code (video).

It took ~46s to finish the fetching phase for a project with a signicant amount of code:

FetchingEnd-GitHub-WithCode

and the same ~47s for a project without code:

FetchingEnd-GitHub-WithoutCode

Connection independency

To prove that it is connection independent we did the same experiment while loading code locally. With code (video):

FetchingEnd-Local-WithCode

Without code (video):

FetchingEnd-Local-WithoutCode

The problem

The issue is that doubling the amount of same-level projects doubles the time it takes to fetch, while increasing the dependency depth exponentially increases the fetching time. For our real system the loading times exceeded 2 hours.

Solution

The intermidiante solution is of course to uglify, flatten and merge the baselines reducing the amount of interconnections to the minimum.

Q: Would it be possible to improve Metacello baseline fetching to skip already processed baselines?

Thank you!