bluehost / pluginmirror

WordPress Plugins GitHub Mirror Application
http://www.pluginmirror.com/
GNU General Public License v2.0
68 stars 24 forks source link

Discrepancy between cloned and total number of plugins in WP SVN? #2

Closed khromov closed 10 years ago

khromov commented 10 years ago

http://www.pluginmirror.com/status

WP SVN Total: 39,468 Cloned to Git: 12,337

Can someone explain the discrepancy? Why not clone all plugins?

Thanks beforehand.

tierra commented 10 years ago

It takes approximately 7 to 10 minutes on average for the Plugin Mirror to clone one single plugin using git-svn. It also sometimes takes a massive amount of RAM even for a single plugin (mostly because the WordPress.org plugins SVN server is one single repo for all 40k plugins), which means that even the Plugin Mirror, which has 32GB of RAM to work with, can still only risk cloning one plugin at a time.

Now try that with 40k plugins, and we're looking at nearly 8 months just to clone all 40k plugins, probably even longer since occasionally, some plugins cause more trouble than others, and take as long as a couple hours to clone.

So, in the mean time, if the plugin you would like cloned up to GitHub still hasn't been cloned yet, just find it on http://pluginmirror.com/plugins and click the "Clone" button to queue it up early. This will put the plugin to the front of the line to be cloned, and it will typically be cloned anywhere from two to four hours usually (though as you can tell by the graph, the "cloning" queue is backed up a bit right now, and it could take a couple days).

The initial clone is the toughest part though. Once that's done, any new commits to the plugin are synced up to GitHub usually within two to four minutes flat.

tierra commented 10 years ago

P.S. There will likely still be a discrepancy between the two numbers even after it's done cloning all plugins because there's actually already a rare few plugins that can't be cloned because they didn't use the proper trunk/branches/tags SVN layout like they're supposed to, or used spaces in branch names - which aren't allowed in git. We've run into 12 of them out of the first 12,000 we've cloned. There's also numerous plugins that violate copyright or licensing rules that will also be blacklisted from cloning to GitHub (likely a good portion of the 10k plugins WordPress.org has removed themselves). We're taking the stance right now that they'll be cloned by default, and removed when reported (and GitHub is working with us on that).

So don't be surprised if we never quite reach that total.

khromov commented 10 years ago

Hi tierra,

I had no idea git-svn was so slow. The cloning system seems to work great.

Ah, so the plugins that WP.org has removed due to TOS violation and similar are still in their SVN?

Thanks for your answers.

tierra commented 10 years ago

Yep, any plugin "removed" from WordPress.org still keeps it's entire code history in SVN, they just stop listing it in the plugin directory. I imagine this is because it's incredibly difficult (though not impossible) to remove it's history from one single huge repository. If they had used individual repos for each plugin, they probably would have removed them altogether.

khromov commented 10 years ago

Thanks again for the help. Closing this as everything is resolved. :)