Closed schmunk42 closed 6 years ago
Add command option is not possible, so only env variable is possible.
Ah sorry, I mixed that up with new commands, but I think it does not make sense (or maybe it's not even possible) to have something like composer update-skip-asset-info
.
This, it's possible.
So I looked a bit into this and if you skip updating the remote git server the performance gain are about ~30-50% (!). I.e. commenting this line.
Updates still worked if the cache is filled.
Another huge improvement would be to load bower.json
files also from cache, this removes another 0,5-1 seconds for every bower package from the total time taken for an update:
[89.2MB/9.38s] Downloading https://bower.herokuapp.com/packages/select2
[89.2MB/10.05s] Writing /Users/tobias/.composer/cache/repo/https---bower.herokuapp.com-packages/select2-ca22abe1ad1684dd62ec875a0b5c25a533cf287a-package.json into cache
Looks to me like the file is always downloaded and then written to the cache.
Maybe this could also require some changes in composer itself. Do you think it's worth to create an issue like "Feature: Allow updates from cache only"?
Last but not least it would be awesome if Git updates could be run in parallel, like downloads with prestissimo. I think this would make the above changes obsolete anyway.
This could be combined with a functionality checking an expire setting, like: "Warning Asset package definitions have not been updated for 7 days".
FXP_SKIP_ASSET_INFO_UPDATE_DAYS=7
While I'd also like a separate command, the ENV variable is the way to set a default.
@francoispluchino Any advice where to start or do you have plans about this feature, it would be great improvement.
The bower.json and package.json files are cached. It only has files in the main branch that are not cached.
Regarding the update-skip-asset-info
command, Maybe prefer a namespace fxp:update
(plus court), And it would only configure the variable FXP_SKIP_ASSET_INFO_UPDATE_DAYS
(via an option, 7 by default), and use the same code and options as the update
command.
You can make a pull request for this feature!
I like the syntax fxp:update
- using it in phd since a while :) The only thing is that you actually don't do a fxp update, rather the opposite, like (update-fast
, fxp:update-quick
)
So, @handcode and I dug a bit on this ...
To really get a huge performance gain, we'd need to address two things.
One is this https://github.com/fxpio/composer-asset-plugin/blob/master/Repository/BowerRepository.php#L58 Here the asset plugin tells composer where it can find the git repo for an extension.
We hacked it like so:
// fake local URL, if exists
$local = $data['url'];
$local = str_replace('/','-', $local);
$local = str_replace('https:','/path/to/user/.composer/cache/vcs/https-', $local);
if (is_dir($local)) {
$data['url'] = $local;
}
This skips the update of the git repository, if it exists, actually by setting the URL to a local path.
The other one is the URL of the repo itself, see https://github.com/fxpio/composer-asset-plugin/blob/master/Repository/BowerRepository.php#L34
return 'file:///path/to/bower-definition.json';
This one is harder to fake, because we don't know the hash of the repo. The files we need would be in ~/.composer/cache/repo/https---bower.herokuapp.com-packages
.
[update] Would also be great if we could just delegate this to prestissimo
, that might also be sufficient.
So, both points mentioned above would essentially skip the update of asset repositories. Some benchmarks with a really huge project: 336 packages (55 of them are from bower):
Would love to hear some feedback about this.
@schmunk42 Do not forget NPM Repository. Otherwise, your idea is interesting.
Regarding the manipulation of the plugin options, I have already transferred the configuration of the plugin (formerly extra.asset-*
) in the config.fxp-asset.*
section. Because the config section doesn't change the hash of the locked file composer.lock, which is not the case for the extra section.
Furthermore, I will add support for the global configuration (#273), as well as support of the environment variables to change the configuration, to override the configuration with formatting: FXP_ASSET__<VARIABLE_NAME_WITH_UPPERCASES_AND_UNDERSCORES>
(#274).
In this way you will can propose your solution to disable the analysis of the asset dependencies.
Do not forget NPM Repository.
I haven't tested this, but it looked to me, like this would also be the way npm works with fxp, simply because composer works that way.
Your last comment change the behavior directly on the BowerRepository
class, so, it's independant of the Composer, and the change is only for Bower in this plugin.
So, I played around a bit with npm-assets ...
One thing I noticed is, that despite the fact npm is handing out a lot more information than bower, it takes only about 0,2 secs to get that information compared to 0,6 secs for bower (which is only the repo info) - I am speaking of Downloading https://bower.herokuapp.com/packages/bootstrap
vs. Downloading https://registry.npmjs.org/bootstrap
- these are pretty rough numbers, but they sum up if you have a lot of asset packages.
The rest of the processes are the same, pruning and updating the repo takes the most amount of time, usually ~1-3 secs per asset repository (depending on the size)
Executing command (/Users/tobias/.composer/cache/vcs/https---github.com-twbs-bootstrap.git/): git remote set-url origin 'https://github.com/twbs/bootstrap.git' && git remote update --prune origin
This could also be almost completely eliminated when skipping that check by using a local repo. Another option would be to optimize that step, but that would also involve composer itself.
CC: @tonydspaniard @cebe @samdark @mikehaertl
Some more findings ...
The line in composer which runs the update is here.
If this line would be missing origin
(or if it would be configurable) - we could do something like this:
Taken from https://git-scm.com/docs/git-remote - just configure that origin has not to be updated - or do that manually.
I know that this somehow defeats the purpose of "update" but on the other hand, you do not have to update & purge your local mirror every time - other package manager warn you, if this hasn't been done for some time (eg. two weeks on Macports).
I hardy dare to CC: @Seldaek - but I'd really like to hear his opinion about it.
@schmunk42 We can avoid making the 5-6 queries by using the cache and checking the update date on the first query (only for the Github driver).
Currently, Each repository addition systematically performs 5 minimum requests (More if several pages of tags and/or branch). Example for Jquery with Bower:
Downloading https://api.github.com/repos/jquery/jquery-dist
Downloading https://api.github.com/repos/jquery/jquery-dist/contents/bower.json?ref=master
Downloading https://api.github.com/repos/jquery/jquery-dist/commits/master
Downloading https://api.github.com/repos/jquery/jquery-dist/tags?per_page=100
Downloading https://api.github.com/repositories/28825109/tags?per_page=100&page=2
Downloading https://api.github.com/repos/jquery/jquery-dist/git/refs/heads?per_page=100
The result of:
Downloading https://api.github.com/repos/jquery/jquery-dist
contains the created_at
, updated_at
, and pushed_at
fields. It's therefore possible to keep the updated date and check whether it has changed or not. Everything happens in the Github driver, so it's compatible with Bower and NPM.
With this cache, each repository addition systematically performs 1 minimum request, and not 5.
Of course, this optimization can come in addition to your proposal.
All my tests already have the github-no-api
option enabled, that was actually the first performance improvement. Because updating takes so much longer when using the API, especially with repos with a huge amount of tags. (Eg. Angular has 1000s of tags resulting in dozens of requests)
IMHO a decent performance can only be achieved, when disabling the API requests at all. And reading the tags from local git clones. I think bower, npm, yarn, ... do the same, or? (not 100% sure)
Does it impact composer outdated
? For now command downloads every tag for every bower dependency and makes command unusable
As already mentioned in the issue #226, Bower use a shallow clone
to download the history truncated to the first level (--depth=1
) for all branches (ex. for Angular 1.x, the clone size is 8MiB versus 575MiB, and the resolution of deltas is much faster) or use the git ls-remote to retrieve the list of tags and branches if the shallow clone is not supported by the server or the protocol (http).
Bower can use a 'slow clone' (full clone) or 'fast clone' (shallow clone), see the code here. The 'slow clone' is used if the resolution use a commit id (sha).
Bower verifies whether the server supports shallow cloning. This is done according to the rules found in the following links:
Summary of the rules: (extracted from the source)
HEAD
or GET
request to the URI (appended to the URL of the Git repo):
/info/refs?service=git-upload-pack
Content-Type
header of application/x-git-upload-pack-advertisement
, the server supports shallow cloning ("smart server")HEAD
or GET
request using an HTTP client, we're letting Git and Curl do the heavy lifting.
Calling Git with the GIT_CURL_VERBOSE=2
env variable will provide the Git and Curl output, which includes the content type. This has the advantage that Git will take care of using stored credentials and any additional negotiation that needs to take place.The above should cover most cases, including BitBucket.
IMHO, this optimization should be directly integrated to Composer, because it will benefits all VCS Repositories.
That could be a huge improvement in install times for PHP packages as well.
@schmunk42 Your proposal can be available for the 1.3.0
release ?
I played around a bit with shallow clones, but the operation mentioned in https://github.com/fxpio/composer-asset-plugin/issues/251#issuecomment-280875259 doesn't seem faster on a shallow clone. For sure the initial download is faster, but you usually have to take this penalty only once.
I can't really dig it up right now, but I remember a discussion on CocoaPods that shallow clones might be even slower under some circumstances, anyhow ... I don't think they will bring much improvement overall.
What costs so much time is that fxp creates vcs repositories, which are fully scanned on every update by composer. If you have 20 asset repos you need about 10 seconds for downloading the repo info (bower) and ~30 secs for updating the repo (syncMirror) - the rest of the update, like downloading info from packagist and SAT solver take usually less than 10 seconds.
The former two tasks are usually not needed on every update, maybe once a day would be totally fine (or on demand). The (cache) info would be shared across projects on your host.
But when you need to run update, let's say 5 times, because you need to find a matching set of extensions, this becomes really annoying - even more with API requests.
I was re-examining this problem, and trying to remember why the discussion had stopped on this subject. The shallow clone may become slower, if for example, you have to download each branch and tag to retrieve the .json
file. The analysis and downloading could become longer than a full clone (after, there may be other cases, but I have never met them).
We cannot do this with Composer because the solver SAT retrieves the full list of dependencies with all versions even if the version of the package doesn't match. Therefore the lazy loading is not used. And if we implement the clone shallow, to retrieve the .json
file from each version, We should clone each branch and each tags analyzed. In this case, it is better to download the archive directly.
So this option is not possible without a modification of the solver SAT (and they do not want to touch it).
To return to your proposal to "disable" the search of update of assets, which in the case that you explosed, and actually interesting, do you have a PR?
To return to your proposal to "disable" the search of update of assets, which in the case that you explosed, and actually interesting, do you have a PR?
Not yet :) One thing would be to skip syncMirror in composer - could this be solved in a plugin?
Added by #289
Continued from https://github.com/yiisoft/yii2/issues/8688#issuecomment-242215164
Proposal: Add an option, like
--skip-asset-info-update
or ENV variableFXP_SKIP_ASSET_INFO_UPDATE
to disable git and/or GitHub API requests, which query for new versions/tags, since this might take a while.The option should only have an effect, when local information about the repository already exists. If not the plugin should download the information.