fxpio / composer-asset-plugin

NPM/Bower Dependency Manager for Composer
MIT License
893 stars 156 forks source link

Improve performance of Github repositories #226

Closed francoispluchino closed 8 years ago

francoispluchino commented 8 years ago

To drastically improve performance with Github repositories, use the native Git if installed by default.

There is already an option (no-api) allowing to use GIT instead of API Github, but we must reference each package manually. So, we must allow to disable this option globally for all dependencies in project and/or for each asset package.

Regarding performance, we can improve:

Therefore we are also improving the repositories using GIT.

See last comments of the issue #93 by @schmunk42 and me.

francoispluchino commented 8 years ago

Small precision, with this method we will get the same performance as Bower.

schmunk42 commented 8 years ago

Related: https://github.com/composer/composer/pull/4961

francoispluchino commented 8 years ago

Yes, I thought to do a PR directly in Composer. But there is an existing PR in conflict for the moment.

schmunk42 commented 8 years ago

From a reference in the reference above (GitHub Staff Member):

Apparently, most of the initial clones are shallow, meaning that not the whole history is fetched, but just the top commit. But then subsequent fetches don't use the --depth=1 option. Ironically, this practice can be much more expensive than full fetches/clones, especially over the long term. It is usually preferable to pay the price of a full clone once, then incrementally fetch into the repository, because then Git is better able to negotiate the minimum set of changes that have to be transferred to bring the clone up to date.

Looks like it might not even be the very best choice to use shallow clones...


But, if we could create a first basic implementation (disabled by default, to maintain full bc), like adding "something" to this if

like so (dirty code)

if (
  (isset($this->repoConfig['no-api']) && $this->repoConfig['no-api']) || 
  (isset($this->pluginConfig['no-api']) && $this->pluginConfig['no-api'])
  ) {

Having such an option would allow users to test the feature and report if they run into problem with large repos/clones. Those could be handled by more detailed filtering later.

schmunk42 commented 8 years ago

One more: https://github.com/CocoaPods/CocoaPods/issues/4989#issuecomment-193838934

Our advice would be for CocoaPods to stop using any kind of shallow feature from Git altogether. Users should perform a full clone of the repository, and then fetch into it as usual. Simply performing that change should significantly soften the load on our fileservers.

francoispluchino commented 8 years ago

Yes, of course, in all cases, we must activate the option no-api in all VCS repositories.

However, Bower is already using the shallow clone and this is not problematic, see https://github.com/bower/bower/blob/f4cb047e9df55f03e9971ce4f4341c41d99339e3/lib/core/resolvers/GitRemoteResolver.js#L121.

luislobo commented 8 years ago

where should I set the no-api option?

francoispluchino commented 8 years ago

Actually, you must override the definition of each VCS Repository:

{
    "extra": {
        "asset-repositories": [
            {
                "type": "bower-vcs",
                "url": "https://github.com/vendor/exemple-asset-name.git",
                "no-api": true
            }
        ]
    }
}

This issue proposes to add a system to configure this option without having to override the definition of VCS repositories.

francoispluchino commented 8 years ago

@schmunk42 Ok for keep disabled this option by default.

schmunk42 commented 8 years ago

Yes, at least for patch releases.

btw: Could a plugin register command line options? I remember a discussion about this, but can't find it right now...

francoispluchino commented 8 years ago

Extend an existing command? Not yet... but add new commands, yes (see Plugin Capabilities).

schmunk42 commented 8 years ago

I am just thinking loud here... so there should be one option already working, which is adding no-api to every (most) bower/npm repos, but this is very cumbersome.

We've already talked about adding a parameter to the extras section, which would use no-api for all repos or per regex.

The 3rd option could be composer update-no-api which is not very nice, but could be used no a command basis. composer update --bower-asset-no-api would be nicer, but not possible atm.

francoispluchino commented 8 years ago

@schmunk42

We've already talked about adding a parameter to the extras section, which would use no-api for all repos or per regex.

This is the aim of this issue. The plugin automatically creates the VCS repository for each Bower/NPM package. So, we just have to pass this parameter before construction of VCS repository.

Add in the extra config of Composer:

**For example:***

{
    "extra": {
        "asset-github-no-api": true,
        "asset-github-no-api-packages": {
            "bower-asset/package-name": false
        }
    }
}

or

{
    "extra": {
        "asset-github-no-api": {
            "default": true,
            "packages": {
                "bower-asset/package-name": false
            }
        }
    }
}

With the shortcut, if no package must be configured:

{
    "extra": {
        "asset-github-no-api": true
    }
}

If you have a best config? Personally, I have a preference for the second option.

schmunk42 commented 8 years ago

LGTM!

francoispluchino commented 8 years ago

The shallow clone must be added in Composer, and we expect a return of this PR about it.

Regarding the no-api of Github Driver, it is added by fa1b3a5348584e83476c5bc073439f37cc4243a5. Finally, to keep the same formatting with the other options, the format is:

{
    "extra": {
        "asset-vcs-driver-options": {
            "github-no-api": true
        }
    }
}

or

{
    "extra": {
        "asset-vcs-driver-options": {
            "github-no-api": {
                "default": true,
                "packages": {
                    "bower-asset/example-asset1": false
                }
            }
        }
    }
}
cebe commented 8 years ago

I had not time to dig into this yet so, sorry if this is obvious :) Why do we not enable it by default if it makes things faster? what is the drawback?

schmunk42 commented 8 years ago

I just gave this a try by requiring dev-master of your plugin.

The first form looks very good to me, but the second one still makes API calls, when watching detailed output with composer update -vvv.

Benchmarks from https://github.com/phundament/app (repos already cloned)

"github-no-api": true ~27 sec

tobias in ~/Downloads/app-master λ time composer update   
Loading composer repositories with package information
Updating dependencies (including require-dev)
Nothing to install or update                              
Package fabpot/php-cs-fixer is abandoned, you should avoid using it. Use friendsofphp/php-cs-fixer instead.
Generating autoload files
composer update  5.42s user 1.66s system 27% cpu 25.601 total
tobias in ~/Downloads/app-master λ 
tobias in ~/Downloads/app-master λ time composer update
Loading composer repositories with package information
Updating dependencies (including require-dev)
Nothing to install or update                              
Package fabpot/php-cs-fixer is abandoned, you should avoid using it. Use friendsofphp/php-cs-fixer instead.
Generating autoload files
composer update  5.38s user 1.63s system 24% cpu 28.531 total
tobias in ~/Downloads/app-master λ time composer update
Loading composer repositories with package information
Updating dependencies (including require-dev)
Nothing to install or update                              
Package fabpot/php-cs-fixer is abandoned, you should avoid using it. Use friendsofphp/php-cs-fixer instead.
Generating autoload files
composer update  5.34s user 1.64s system 26% cpu 26.813 total
tobias in ~/Downloads/app-master λ time composer update
Loading composer repositories with package information
Updating dependencies (including require-dev)

"github-no-api": false ~58 sec

tobias in ~/Downloads/app-master λ time composer update
Loading composer repositories with package information
Updating dependencies (including require-dev)
Nothing to install or update                              
Package fabpot/php-cs-fixer is abandoned, you should avoid using it. Use friendsofphp/php-cs-fixer instead.
Writing lock file
Generating autoload files
composer update  5.19s user 0.26s system 9% cpu 59.032 total
tobias in ~/Downloads/app-master λ time composer update
Loading composer repositories with package information
Updating dependencies (including require-dev)
Nothing to install or update                              
Package fabpot/php-cs-fixer is abandoned, you should avoid using it. Use friendsofphp/php-cs-fixer instead.
Generating autoload files
composer update  5.18s user 0.28s system 9% cpu 57.399 total
tobias in ~/Downloads/app-master λ time composer update
Loading composer repositories with package information
Updating dependencies (including require-dev)
Nothing to install or update                              
Package fabpot/php-cs-fixer is abandoned, you should avoid using it. Use friendsofphp/php-cs-fixer instead.
Generating autoload files
composer update  5.18s user 0.25s system 9% cpu 59.096 total

AWESOME 🎉 ❤️

francoispluchino commented 8 years ago

It slightly breaks compatibility:

Nothing in itself problematic, but it is better to force the user to verify its configuration before activating this option.

schmunk42 commented 8 years ago

@cebe

I had not time to dig into this yet so, sorry if this is obvious :) Why do we not enable it by default if it makes things faster? what is the drawback?

It's merely about testing different setups ... I'd suggest to issue a notice first, like "Notice: You can enable github-no-api to improve speed ....." and make it a default in 1.2 or 2.0 of this plugin.

francoispluchino commented 8 years ago

@schmunk42 The next stable version is 1.2.0. So, it will be for the 1.3 or 2.0 version.

schmunk42 commented 8 years ago

More benchmarks ... project like above with these additional plugins (with a lot of bower assets)

        "2amigos/yii2-ckeditor-widget": "^1.0",
        "2amigos/yii2-date-picker-widget": "^1.0",
        "2amigos/yii2-file-input-widget": "^1.0",
        "2amigos/yii2-file-upload-widget": "^1.0",
        "2amigos/yii2-tinymce-widget": "^1.1",
        "2amigos/yii2-leaflet-extension": "^1.0",
        "2amigos/yii2-editable-widget": "^0.1.4",
        "2amigos/yii2-selectize-widget": "^1.0",
        "2amigos/yii2-leaflet-geosearch-plugin": "^1.0",
        "2amigos/yii2-leaflet-geocoder-plugin": "^1.0",
        "2amigos/yii2-gallery-widget": "^1.0",
        "2amigos/yii2-chartjs-widget": "^2.0",
        "2amigos/yii2-highcharts-widget": "^1.0",
        "2amigos/yii2-date-time-picker-widget": "^1.0",
        "2amigos/yii2-type-ahead-widget": "^1.1",
        "2amigos/yii2-grid-view-library": "^0.1.1"

"github-no-api": true

composer update -v  7.94s user 3.19s system 21% cpu 52.160 total

"github-no-api": false

composer update -v  9.38s user 0.39s system 5% cpu 2:43.37 total

Needs just 1/3 of the time 👍 the more assets, the faster it is (relatively) if cache is filled

francoispluchino commented 8 years ago

It could be even faster with shallow clone, but it is on hold. For example, the clone of Twitter Bootstrap has a large size: 179 MiB, with shallow clone: 13 MiB. And for Angular2: ~600 MiB vs 6 MiB...

cebe commented 8 years ago

It could be even faster with shallow clone, but it is on hold. For example, the clone of Twitter Bootstrap has a large size: 179 MiB, with shallow clone: 13 MiB. And for Angular2: ~600 MiB vs 6 MiB...

this can be solved by using simply --prefer-dist. The main issue is not the download but the resolving of dependencies step.

francoispluchino commented 8 years ago

But with Github, git-archive cannot be used to download bower.json or package.json file. So, we must call the API for that, or clone the Git repository. The shallow clone with depth=1 get the last commit for all branches/tags, and so, we can read the json files (it's the technique used by Bower).

schmunk42 commented 8 years ago

The main issue is not the download but the resolving of dependencies step.

@cebe The problem is the huge amount of GitHub API calls, to get all the version information. From my benchmarks the dependency resolution (after all versions are known) is not the problem.

From what I've seen, I can just say: If you work with bower assets, use clones - no matter if they are 600 MiB in size, you need to take this penalty only once. Get version information in a useable way is only possible with native git.

You can still run install --prefer-dist in build or production, but it makes not really sense with update in our (Yii2) scenario.

cebe commented 8 years ago

seems I do not understand enough about the details of this, sorry :)