fxpio / composer-asset-plugin

NPM/Bower Dependency Manager for Composer
MIT License
893 stars 156 forks source link

Add an option to disable asset git/GitHub API requests #251

Closed schmunk42 closed 6 years ago

schmunk42 commented 8 years ago

Continued from https://github.com/yiisoft/yii2/issues/8688#issuecomment-242215164

Proposal: Add an option, like --skip-asset-info-update or ENV variable FXP_SKIP_ASSET_INFO_UPDATE to disable git and/or GitHub API requests, which query for new versions/tags, since this might take a while.

The option should only have an effect, when local information about the repository already exists. If not the plugin should download the information.

francoispluchino commented 8 years ago

Add command option is not possible, so only env variable is possible.

schmunk42 commented 8 years ago

Ah sorry, I mixed that up with new commands, but I think it does not make sense (or maybe it's not even possible) to have something like composer update-skip-asset-info.

francoispluchino commented 8 years ago

This, it's possible.

schmunk42 commented 7 years ago

So I looked a bit into this and if you skip updating the remote git server the performance gain are about ~30-50% (!). I.e. commenting this line.

Updates still worked if the cache is filled.

Another huge improvement would be to load bower.json files also from cache, this removes another 0,5-1 seconds for every bower package from the total time taken for an update:

[89.2MB/9.38s] Downloading https://bower.herokuapp.com/packages/select2
[89.2MB/10.05s] Writing /Users/tobias/.composer/cache/repo/https---bower.herokuapp.com-packages/select2-ca22abe1ad1684dd62ec875a0b5c25a533cf287a-package.json into cache

Looks to me like the file is always downloaded and then written to the cache.

Maybe this could also require some changes in composer itself. Do you think it's worth to create an issue like "Feature: Allow updates from cache only"?

Last but not least it would be awesome if Git updates could be run in parallel, like downloads with prestissimo. I think this would make the above changes obsolete anyway.

schmunk42 commented 7 years ago

This could be combined with a functionality checking an expire setting, like: "Warning Asset package definitions have not been updated for 7 days".

FXP_SKIP_ASSET_INFO_UPDATE_DAYS=7

While I'd also like a separate command, the ENV variable is the way to set a default.

@francoispluchino Any advice where to start or do you have plans about this feature, it would be great improvement.


related: https://github.com/yiisoft/yii2/issues/13064

francoispluchino commented 7 years ago

The bower.json and package.json files are cached. It only has files in the main branch that are not cached.

Regarding the update-skip-asset-info command, Maybe prefer a namespace fxp:update (plus court), And it would only configure the variable FXP_SKIP_ASSET_INFO_UPDATE_DAYS (via an option, 7 by default), and use the same code and options as the update command.

You can make a pull request for this feature!

schmunk42 commented 7 years ago

I like the syntax fxp:update - using it in phd since a while :) The only thing is that you actually don't do a fxp update, rather the opposite, like (update-fast, fxp:update-quick)

schmunk42 commented 7 years ago

So, @handcode and I dug a bit on this ...

To really get a huge performance gain, we'd need to address two things.

One is this https://github.com/fxpio/composer-asset-plugin/blob/master/Repository/BowerRepository.php#L58 Here the asset plugin tells composer where it can find the git repo for an extension.

We hacked it like so:

    // fake local URL, if exists
    $local = $data['url'];
    $local = str_replace('/','-', $local);
    $local = str_replace('https:','/path/to/user/.composer/cache/vcs/https-', $local);
    if (is_dir($local)) {
        $data['url'] = $local;
    }

This skips the update of the git repository, if it exists, actually by setting the URL to a local path.


The other one is the URL of the repo itself, see https://github.com/fxpio/composer-asset-plugin/blob/master/Repository/BowerRepository.php#L34

return 'file:///path/to/bower-definition.json';

This one is harder to fake, because we don't know the hash of the repo. The files we need would be in ~/.composer/cache/repo/https---bower.herokuapp.com-packages.

[update] Would also be great if we could just delegate this to prestissimo, that might also be sufficient.


So, both points mentioned above would essentially skip the update of asset repositories. Some benchmarks with a really huge project: 336 packages (55 of them are from bower):

Would love to hear some feedback about this.

francoispluchino commented 7 years ago

@schmunk42 Do not forget NPM Repository. Otherwise, your idea is interesting.

Regarding the manipulation of the plugin options, I have already transferred the configuration of the plugin (formerly extra.asset-*) in the config.fxp-asset.* section. Because the config section doesn't change the hash of the locked file composer.lock, which is not the case for the extra section.

Furthermore, I will add support for the global configuration (#273), as well as support of the environment variables to change the configuration, to override the configuration with formatting: FXP_ASSET__<VARIABLE_NAME_WITH_UPPERCASES_AND_UNDERSCORES> (#274).

In this way you will can propose your solution to disable the analysis of the asset dependencies.

schmunk42 commented 7 years ago

Do not forget NPM Repository.

I haven't tested this, but it looked to me, like this would also be the way npm works with fxp, simply because composer works that way.

francoispluchino commented 7 years ago

Your last comment change the behavior directly on the BowerRepository class, so, it's independant of the Composer, and the change is only for Bower in this plugin.

schmunk42 commented 7 years ago

So, I played around a bit with npm-assets ...

One thing I noticed is, that despite the fact npm is handing out a lot more information than bower, it takes only about 0,2 secs to get that information compared to 0,6 secs for bower (which is only the repo info) - I am speaking of Downloading https://bower.herokuapp.com/packages/bootstrap vs. Downloading https://registry.npmjs.org/bootstrap - these are pretty rough numbers, but they sum up if you have a lot of asset packages.

The rest of the processes are the same, pruning and updating the repo takes the most amount of time, usually ~1-3 secs per asset repository (depending on the size)

Executing command (/Users/tobias/.composer/cache/vcs/https---github.com-twbs-bootstrap.git/): git remote set-url origin 'https://github.com/twbs/bootstrap.git' && git remote update --prune origin

This could also be almost completely eliminated when skipping that check by using a local repo. Another option would be to optimize that step, but that would also involve composer itself.

CC: @tonydspaniard @cebe @samdark @mikehaertl

schmunk42 commented 7 years ago

Some more findings ...

The line in composer which runs the update is here. If this line would be missing origin (or if it would be configurable) - we could do something like this:

bildschirmfoto 2017-02-18 um 22 18 55

Taken from https://git-scm.com/docs/git-remote - just configure that origin has not to be updated - or do that manually.

I know that this somehow defeats the purpose of "update" but on the other hand, you do not have to update & purge your local mirror every time - other package manager warn you, if this hasn't been done for some time (eg. two weeks on Macports).

I hardy dare to CC: @Seldaek - but I'd really like to hear his opinion about it.

francoispluchino commented 7 years ago

@schmunk42 We can avoid making the 5-6 queries by using the cache and checking the update date on the first query (only for the Github driver).

Currently, Each repository addition systematically performs 5 minimum requests (More if several pages of tags and/or branch). Example for Jquery with Bower:

Downloading https://api.github.com/repos/jquery/jquery-dist
Downloading https://api.github.com/repos/jquery/jquery-dist/contents/bower.json?ref=master
Downloading https://api.github.com/repos/jquery/jquery-dist/commits/master
Downloading https://api.github.com/repos/jquery/jquery-dist/tags?per_page=100
Downloading https://api.github.com/repositories/28825109/tags?per_page=100&page=2
Downloading https://api.github.com/repos/jquery/jquery-dist/git/refs/heads?per_page=100

The result of:

Downloading https://api.github.com/repos/jquery/jquery-dist

contains the created_at, updated_at, and pushed_at fields. It's therefore possible to keep the updated date and check whether it has changed or not. Everything happens in the Github driver, so it's compatible with Bower and NPM.

With this cache, each repository addition systematically performs 1 minimum request, and not 5.

Of course, this optimization can come in addition to your proposal.

schmunk42 commented 7 years ago

All my tests already have the github-no-api option enabled, that was actually the first performance improvement. Because updating takes so much longer when using the API, especially with repos with a huge amount of tags. (Eg. Angular has 1000s of tags resulting in dozens of requests)

IMHO a decent performance can only be achieved, when disabling the API requests at all. And reading the tags from local git clones. I think bower, npm, yarn, ... do the same, or? (not 100% sure)

dxops commented 7 years ago

Does it impact composer outdated? For now command downloads every tag for every bower dependency and makes command unusable

francoispluchino commented 7 years ago

As already mentioned in the issue #226, Bower use a shallow clone to download the history truncated to the first level (--depth=1) for all branches (ex. for Angular 1.x, the clone size is 8MiB versus 575MiB, and the resolution of deltas is much faster) or use the git ls-remote to retrieve the list of tags and branches if the shallow clone is not supported by the server or the protocol (http).

Bower can use a 'slow clone' (full clone) or 'fast clone' (shallow clone), see the code here. The 'slow clone' is used if the resolution use a commit id (sha).

Bower verifies whether the server supports shallow cloning. This is done according to the rules found in the following links:

Summary of the rules: (extracted from the source)

The above should cover most cases, including BitBucket.

IMHO, this optimization should be directly integrated to Composer, because it will benefits all VCS Repositories.

samdark commented 7 years ago

That could be a huge improvement in install times for PHP packages as well.

francoispluchino commented 7 years ago

@schmunk42 Your proposal can be available for the 1.3.0 release ?

schmunk42 commented 7 years ago

I played around a bit with shallow clones, but the operation mentioned in https://github.com/fxpio/composer-asset-plugin/issues/251#issuecomment-280875259 doesn't seem faster on a shallow clone. For sure the initial download is faster, but you usually have to take this penalty only once.

I can't really dig it up right now, but I remember a discussion on CocoaPods that shallow clones might be even slower under some circumstances, anyhow ... I don't think they will bring much improvement overall.

What costs so much time is that fxp creates vcs repositories, which are fully scanned on every update by composer. If you have 20 asset repos you need about 10 seconds for downloading the repo info (bower) and ~30 secs for updating the repo (syncMirror) - the rest of the update, like downloading info from packagist and SAT solver take usually less than 10 seconds.

The former two tasks are usually not needed on every update, maybe once a day would be totally fine (or on demand). The (cache) info would be shared across projects on your host.

But when you need to run update, let's say 5 times, because you need to find a matching set of extensions, this becomes really annoying - even more with API requests.

francoispluchino commented 7 years ago

I was re-examining this problem, and trying to remember why the discussion had stopped on this subject. The shallow clone may become slower, if for example, you have to download each branch and tag to retrieve the .json file. The analysis and downloading could become longer than a full clone (after, there may be other cases, but I have never met them).

We cannot do this with Composer because the solver SAT retrieves the full list of dependencies with all versions even if the version of the package doesn't match. Therefore the lazy loading is not used. And if we implement the clone shallow, to retrieve the .json file from each version, We should clone each branch and each tags analyzed. In this case, it is better to download the archive directly.

So this option is not possible without a modification of the solver SAT (and they do not want to touch it).

To return to your proposal to "disable" the search of update of assets, which in the case that you explosed, and actually interesting, do you have a PR?

schmunk42 commented 7 years ago

To return to your proposal to "disable" the search of update of assets, which in the case that you explosed, and actually interesting, do you have a PR?

Not yet :) One thing would be to skip syncMirror in composer - could this be solved in a plugin?

francoispluchino commented 6 years ago

Added by #289