Closed jlubeck closed 8 years ago
Tried again, new error:
error: RPC failed; result=18, HTTP code = 200 fatal: The remote end hung up unexpectedly [!] /usr/bin/git clone https://github.com/CocoaPods/Specs.git master --depth=1
Cloning into 'master'... error: RPC failed; result=18, HTTP code = 200 fatal: The remote end hung up unexpectedly
Very weird...
I got same issue, too.
Same. Cannot clone spec repo.
+1, same issue
git clone https://github.com/CocoaPods/Specs.git
takes forever
+1, been messing around with this for awhile. I doubled the buffer, didn't work. Uninstalled and reinstalled pods, didn't work. Tried to clone manually, no cigar. It actually seems to be getting "something" but fails. Using verbose didn't say much, just said it had issues accessing it.
I tried accessing my other repos and it seemed to be OK, but it was definitely slower than normal.
+1
I got same issue, too. my pod version was "0.39.0"
I tried cloning master repos directly (by git clone git@github.com:CocoaPods/Specs.git master --depth=1 --verbose
), but also failed.
+1
+1
+1
+1. No success after increasing buffer / reinstall / manual clone
Temporary workaround which might work: https://github.com/CocoaPods/Specs/archive/master.zip haven't tested though
wget https://github.com/CocoaPods/Specs/archive/master.zip
https://github.com/CocoaPods/Specs/archive/master.zip would be the correct link, I guess?
Same error here. What should we do with the file at https://github.com/CocoaPods/Specs/archive/master.zip ?
My bad, the wget link is correct. Just edited my first link.
Not sure what to do with the file yet. Trying to see if we can manually run the commands to have pod setup work.
Yeah, it merely downloads the repo's contents. The .git/ directory is missing, so it's not recognized as a git repo.
yes.. same here.. It always tries to clone the master repo. Even when I run it with --no-repo-update
I get "Creating shallow clone of spec repo master-1
from https://github.com/CocoaPods/Specs.git
"
Did anyone try this with 1.0.0 beta 4?
@MarkMolina I tried, but same result.
Try to cd into ~/.cocoapods/repos/master
, then git clean -fd
to clean up the working copy, git checkout -- .
to ensure you're on master and then git pull
manually. This took ages but worked for me.
+1
Thx, but I removed my master spec repo before I realized something was up with the Github repo ^.^
Got a temp workaround! Tested with my app and everything is working. This is really only needed if you deleted the master repo. If the master folder is still in your ~/.cocoapods/repos
folder with contents then you should be ok to just use pod install --no-repo-update
.
pod setup
. This should at minimum download the .git
to ~/.cocoapods/repos/master
.git
folder from the master folder to somewhere temporary. ~/.cocoapods/repos/master
.git
folder from wherever you put it to ~/.cocoapods/repos/master
as .git
pod install --no-repo-update
And you should be good to go!
So in short, here is the basic list of commands I used:
pod setup (in a separate tab)
mv ~/.cocoapods/repos/master/.git ~/tempSpecsGitFolder
^C on pod setup tab
wget https://github.com/CocoaPods/Specs/archive/master.zip
open master.zip (unzipping it)
mv Specs-master ~/.cocoapods/repos/master
mv ~/tempSpecsGitFolder ~/.cocoapods/repos/master/.git
cd [project folder]
pod install --no-repo-update
Is this a Cocoapods or a wider GitHub issue?
@aceontech Pretty sure it is a GitHub issue, but my other repos are working fine so perhaps only certain repos on certain servers (on their backend) are affected.
I was just able to do a successful pod setup
. Don't know if it's repeatable.
AlexMacBookPro:repos alex$ pod setup --verbose
Setting up CocoaPods master repo
Creating shallow clone of spec repo `master` from `https://github.com/CocoaPods/Specs.git` (branch `master`)
$ /usr/bin/git clone https://github.com/CocoaPods/Specs.git master --depth=1
Cloning into 'master'...
Checking out files: 100% (74393/74393), done.
$ /usr/bin/git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
Github is very very slow: ~ 40-50KB/s
I was able to do pod setup -- verbose
right now.
This is a GitHub issue rather than a cocoapods issue -- you're best off reporting it to their support rather than us, since there's nothing we can do about it.
And github refers to Cocoapods... Great...
There's nothing we can do -- github are the ones who host the repo and are responsible for serving it. The only commits in the past day have been changing files via their REST API, so the idea a bad commit got in is very unlikely. For the meantime, installing using --no-repo-update
if you already have the master repo cloned is probably the best bet.
I've contacted support about it, hopefully everything should be pretty easy to fix
I was able to clone the repo manually, do we know what is special about the way CocoaPods clones it that we could use to help github ?
CocoaPods uses this git command line API, it's calling git clone https://github.com/CocoaPods/Specs.git
- I wonder if the problems are location specific, as these commands aren't working for me in NYC.
I've also contacted GitHub support today, no answer still. I think that the issue is not location-based since there's plenty of distance between NYC & Munich.
Issue was "coming and going" during morning hours, GitHub status page did not reveal any problems during outage.
What would be a really good addition to CocoaPods is a possibility to change CocoaPods specs repository URL to use in-house replication of it
Potentially mirriors could be hosted at bitbucket or other providers ?
@jcampbell05 right now, Pod::Source
doesn't know how to deal with mirrors, so it wouldn't be much help
Hey all,
I'm one of the engineers on GitHub's Git infrastructure team. I'd like to start by apologizing for not responding more quickly to this thread. We've been investigating the issues that the CocoaPods community has been experiencing, and I wanted to give you an update on what we have found out so far.
The slow fetches and clones (which sometimes time out) that the CocoaPods community is experiencing are caused by automatic rate limiting on our servers, which is done to make sure that extremely high levels of load in one repository cannot impact other GitHub users. The CocoaPods/Specs repository is more or less permanently being rate limited. Why? There are several factors coming together:
CocoaPods/Specs
is quite well known within our team :wink:--depth=1
option. Ironically, this practice can be much more expensive than full fetches/clones, especially over the long term. It is usually preferable to pay the price of a full clone once, then incrementally fetch into the repository, because then Git is better able to negotiate the minimum set of changes that have to be transferred to bring the clone up to date.Specs
directory, which contains 16k+ subdirectories, causes some Git operations to be unexpectedly expensive, further driving up CPU usage.All of these factors combine to make CocoaPods/Specs one of the top five most resource-costly repositories that we host on all of GitHub.com. And that is why it is rate-limited; otherwise it would consume even more resources and cause service interruptions for other GitHub users. The symptoms of the rate limiting for you and your users are that your repository accesses (clones, fetches, pushes) have to wait in a queue on our end, sometimes for a long time, before being processed. This causes fetches/clones to take much longer than they would otherwise, and might cause timeouts at your end. Moreover, if the load on our servers becomes too overwhelming, a fraction of the accesses might be rejected altogether.
So, what can we do about it?
First and foremost, let me make reiterate our commitment to hosting Open Source projects for free, forever. Our platform doesn't have "hard limits" or monthly traffic quotas. But the same commitment we have towards CocoaPods we also have towards all the other OSS projects that share their storage hosts with your project, and that simply wouldn't be able to operate if our automatic monitoring didn't throttle access to the CocoaPods/Specs repository.
That said, we're working in the open-source Git project on patches to fix the pathological behavior you're experiencing (e.g., see http://thread.gmane.org/gmane.comp.version-control.git/288403). We think Git's handling of shallow clones can be improved, but this might take a while. If the Git client needs to be changed, it wouldn't help until the new client is in the hands of the majority of your users.
The remaining issues, however, are mostly in the hands of the CocoaPods project. I have the feeling that the easiest possible first step would be to address point 2, by changing CocoaPods to use full rather than shallow clones. I assume that the typical clone is updated many times during its lifetime, in which case the initial cost of the larger clone should easily pay off over time while significantly decreasing the load on our servers. Existing clones can be converted from shallow to deep by running
git fetch --depth=2147483647
within the repository.
I believe that the change to using non-shallow clones will start reducing the cost of fetches, which will automatically cause the rate limits imposed by our systems to be loosened, ultimately giving a much better experience to the users of CocoaPods.
Longer-term, you should also consider points 1 and 4. Using GitHub as your CDN is not ideal, for anybody involved. I would urge you to consider how CocoaPods could be distributed without using Git operations, which are intrinsically hard to scale. I'm confident that you could come up with a more reliable approach for serving packages. Perhaps a method that is more similar to the approaches used by other packaging systems would work better.
I hope this information is helpful. Please let us know if you have any questions!
@mhagger would HTTP fetching be easier to scale ?
@jcampbell05: unfortunately, HTTPS vs SSH wouldn't make a noticeable difference. The expensive part is figuring out which Git objects the client already has, which ones it needs, computing deltas for those objects, and compressing the deltas. When the client has a non-shallow history, the first two steps become much cheaper and the last two steps can often be optimized away entirely.
@mhagger What I was meaning is that you can directly link to files via HTTP using the raw.githubusercontent.com domain. If we were to download some things via HTTP directly rather than git would that help ?
I've removed a post noting that I wish we could have been told about the burden earlier so we could have helped out before hitting a ceiling, however, I can imagine it's difficult on your side to keep people in the loop about things like this. Sorry, don't want to de-rail!
I've left some ideas here to help with the above but I'm not sure if they will help https://github.com/CocoaPods/CocoaPods/issues/5000
I'm very passionate about us getting a deploy
command at some point (Works just like bundler's bundle install --deployment
).
Hi, another GitHub employee here from the Platform (i.e. API) team and Homebrew maintainer (so I feel the pain of both sides).
If we were to download some things via HTTP directly rather than git would that help ?
It would help if you were using e.g. master.tar.gz
tarballs as they can be more easily cached and served without hitting the Git layer every time. The problem from your side is that you'd need to do a ~60MB download every time so I can see this being undesirable.
As well as the shallow changes @mhagger suggested this new, preview API should help: https://developer.github.com/changes/2016-02-24-commit-reference-sha-api/. It's helped Homebrew dramatically reduce the number of no-op git fetch
s which also will make things better for your users as a no-op API HTTP call is significantly faster for you (and less expensive for GitHub) than a no-op git fetch
. Feel free to @mention me directly on any pull request implementing it so I can help you ensure you're caching it nicely.
@mikemcquaid That looks like it will be a huge help, thank you!. I'm sure @segiddins, @alloy or @orta will get in touch with their thoughts :) :rocket:
For me a three-tier approach may be best:
lockfile
(Mainly for CI)no-ops
@mhagger
I'm one of the engineers on GitHub's Git infrastructure team. I'd like to start by apologizing for not responding more quickly to this thread.
No worries and thanks for jumping on this at all :+1:
I knew that GitHub must spend a sizeable amount of resources on making a repo like CocoaPods/Specs
available for ‘free’ to all our users before, but some of the information you’ve now given makes that even clearer.
So in name of all CP users, first of all, thanks for all that :clap:
With all the hugs and kisses out of the way, let’s get onto sorting this all out. I’ll try to focus on what I think is important for this discussion, but please do point it out if I overlooked important information from your message!
Longer-term, you should also consider points 1
It’s unclear to me what it is in point 1 specifically that we should consider. Can you make that more explicit?
and 4.
This point seems an interesting tidbit, but it’s not clear to me at all why this is the case. Do you have links for us to read-up on this?
Using GitHub as your CDN is not ideal, for anybody involved. I would urge you to consider how CocoaPods could be distributed without using Git operations, which are intrinsically hard to scale. […]
There are a few reasons why we decided to go this route:
Perhaps a method that is more similar to the approaches used by other packaging systems would work better.
For the ‘HR’ and funding reasons listed above, I think we’re actually being ‘smarter’ than various other packaging systems. I’m not going to name them, but I’m sure you can think of examples.
I'm confident that you could come up with a more reliable approach for serving packages.
I’m not at all afraid that we as devs can’t come up with all sorts of solutions :wink:, but I’d like to stay away from immediately assuming that things cannot work at all with the current design and ending up building a cathedral.
I.e. I’d like us to continue this discussion, at first, from the notion of us maintaining the existing architecture. Where things are absolutely impossible, it would be great if you can include more links to docs/source that explain why things are impossible.
Maybe we could host a snapshot of the git repo as a ‘release’ and initially download that?
In addition, reading the linked to bug report, I’m not entirely sure I understand if shallow clones are or are not able to work in any feasible way right now. Could you expand on that? E.g. the bug report thread mentions various options, such as “--deepen
, --shallow-since
and --shallow-exclude
”, could any of these be helpful to us in any way?
@mikemcquaid
It would help if you were using e.g.
master.tar.gz
tarballs as they can be more easily cached and served without hitting the Git layer every time. The problem from your side is that you'd need to do a ~60MB download every time so I can see this being undesirable.
You are referring to these, yeah?
Yeah that kinda sounds like my idea, except I’d like that to be a one time thing.
I should have stated in my earlier comment that my idea of hosting a snapshot was meant as a way for users to more easily get a full clone, which, as I understand it, would take the shallow/server-side CPU usage burden away?
As well as the shallow changes @mhagger suggested this new, preview API should help: https://developer.github.com/changes/2016-02-24-commit-reference-sha-api/. It's helped Homebrew dramatically reduce the number of no-op
git fetch
s
This looks very interesting, thanks for sharing!
Just to be clear, are the number of no-op git fetch
s currently a burden that’s leading to the rate-limiting as well?
You are referring to these, yeah?
@alloy I am, yep.
Yeah that kinda sounds like my idea, except I’d like that to be a one time thing.
Sure. Unfortunately that archive is the output of git archive
so does not include any .git
directory/metadata.
Just to be clear, are the number of no-op git fetchs currently a burden that’s leading to the rate-limiting as well?
That's something that's hard for me to identify exactly. I guess it's a question of how often you think users are running git fetch
(or equivalent) when there's nothing new to download. My experience locally is that a no-op git fetch
for this repository is extremely slow so it's probably worth implementing just for that case and it definitely will decrease load for GitHub rather than increase it.
and 4.
This point seems an interesting tidbit, but it’s not clear to me at all why this is the case. Do you have links for us to read-up on this?
@alloy: In the Git object model, each version of each directory is stored as a "tree" object. Whenever something changes under the directory, a whole new, modified copy of the tree object has to be written to the object database. The Specs
directory has 16k+ entries, and is about 450kb in size (compressed). Every single commit requires a new version of this giant tree.
This superficially doesn't seem so bad, because usually only a single entry in the tree changes each time. So successive versions of the tree delta well against each other, and the repository doesn't explode in size.
The problem is that many Git operations have to traverse the tree, which means that internally the 450kb object has to be recreated from its deltas (usually through multiple steps of deltas, each of which has to be found and decompressed). And your repository has nearly 100k commits, so operations that need to traverse the whole history become extremely expensive.
If, for example, this directory were sharded into subdirectories based on the first and second letters of the package name like so
a/_/A
a/f/A-Framework
a/2/A2DynamicDelegate
a/2/A2StoryboardSegueContext
a/3/A3GridTableView
...
a/u/authorizenet-sdk
a/u/autoAutoLayout
b/6/B68UIFloatLabelTextField
b/a/BABAudioPlayer
b/a/BABCropperView
b/a/BABFrameObservingInputAccessoryView
...
z/i/zipzap
z/l/zldtest
z/l/zldzhang
z/x/zxcvbn-ios
then the Specs
directory and its subdirectories would only have 26ish entries, and the next level of directories would all have fewer than a few hundred entries. A modification in such a directory layout would have to rewrite three trees instead of one, but each tree is so much smaller than the current Specs
tree that it would nevertheless be a big win.
Such a layout is also a big win for many other reasons. For example, when computing diffs, if two Specs
trees have identical a
subdirectories, then that can be seen without looking inside the subdirectory's tree at all (because the SHA-1s of the trees would be identical). So computing the diff between two successive versions in the sharded scheme probably only requires a few (small) trees to be opened and a few dozen SHA-1s to be compared, whereas today it requires two gigantic trees to be opened and 16k SHA-1s to be compared.
Thanks for your thoughtful reply, @alloy. Hope @mhagger has cleared up the question about your large trees. Regarding your other points:
It’s unclear to me what it is in point 1 specifically that we should consider. Can you make that more explicit?
Point 1 basically refers to using GitHub as a CDN. We totally understand this is convenient for you, and we work hard around the clock to make this a viable option, but Git, by design, is not suited to act as a CDN. You're burning weeks of CPU time and gigabytes of bandwidth from our infrastructure that could be replaced with very little CPU and very little bandwidth if CocoaPods were using a more traditional design for a package management system.
Maybe we could host a snapshot of the git repo as a ‘release’ and initially download that?
This would not be a strict improvement. If you use the tarballs that we offer for download, you will not have the Git metadata for the repository, so further fetches won't be possible. It'd be just as cheap to perform a full clone through Git -- GitHub has a special implementation on the server-side that can make serving a full clone particularly cheap as long as not a shallow clone. And obviously, you can continue fetching on top of the original clone.
To reiterate: the major performance issue is not on doing an initial clone of the CocoaPods repository, but in performing a shallow clone and then repeatedly fetching into it, like the CocoaPods client is currently doing.
I’m not entirely sure I understand if shallow clones are or are not able to work in any feasible way right now. Could you expand on that? E.g. the bug report thread mentions various options, such as “--deepen, --shallow-since and --shallow-exclude”, could any of these be helpful to us in any way?
Our advice would be for CocoaPods to stop using any kind of shallow
feature from Git altogether. Users should perform a full clone of the repository, and then fetch into it as usual. Simply performing that change should significantly soften the load on our fileservers.
You may be led to believe that this is inefficient (in bandwidth or disk storage), but it actually ends up being significantly cheaper than your current approach. Git is not very good at shallow data, and one pattern we've found (and that we're trying to fix upstream in Git itself) is that merging a branch and fetching that into a shallow repository can cause Git to send an unreasonable amount of objects when that merge crosses the grafted shallow-point of the repository. You can read the investigation in the Git ML here: http://thread.gmane.org/gmane.comp.version-control.git/288403
Besides dropping the shallow clones, I would still urge you to implement @mikemcquaid's suggestion regarding the preview API for no-op updates. At this point, most of the throttling comes from expensive fetches, but every small bit helps.
At the end of the day, any Git pattern will "work" in practice: we have a unique in-house monitoring system that ensures the full availability of our Git platform no matter the circumstances. But this obviously leads to issues like the current thread. If the operations you're performing are not as optimal as they could (or are pathological like in this case), they will be automatically throttled or cancelled on our servers, and this is a poor experience for the users of CocoaPods.
We cannot force you to change the design of your package manager, but we'd like to reiterate that Git (the version control system itself -- nothing to do with GitHub as a platform) is unsuited for what you're trying to do here. We're here to help you soften the pain, and we'll continue improving the performance of our platform and of the OSS Git client to make pathological workflows work in practice, but this is hard work. We can't assure an ideal user experience with CocoaPod's design choices. :crying_cat_face:
Note from @orta -
If you are here because your Specs repo isn't updating, run:
cd ~/.cocoapods/repos/master && git fetch --depth=2147483647
- this will convert your local repository of Podspecs to be a full clone, as opposed to a shallow copy.What did you do?
Run
pod setup
What did you expected to happen?
Clone Spec repo master
What happened instead?
It only downloads a few bytes and then throws error:
No Podfile yet
I also tried cloning the repo manually or with the githhub desktop app with no avail. I´m having no issues cloning any other repo in github. Only with this one. Is it possible there is something wrong with it???
Thanks