facebook / hhvm

A virtual machine for executing programs written in Hack.
https://hhvm.com
Other
18.13k stars 2.98k forks source link

Re-think RepoAuth #6878

Open atdt opened 8 years ago

atdt commented 8 years ago

Why RepoAuth is hard

It's hard to reason about, because it muddles the relationship between running code and source files. For many PHP shops, the deployment model is simple: you deploy code (be it JavaScript, CSS, or PHP) by copying the source files to your production server. You update code by changing the source files. With RepoAuth, you need to have a mental model for PHP code changes that is different from how you think about changes to front-end code and other static assets.

It's also hard to synchronize large binary files over a wide-area network. Many PHP shops rely on file synchronization tools that are based on the rsync algorithm, which is very efficient for pushing out in-place updates to a source tree, but not for pushing out an HHBC repo. This makes code deployments slow. And extra care must be taken to avoid network congestion.

For many, migrating to RepoAuth requires forfeiting the ability to quickly roll back bad deploys or deploy a hot fix to bugs in production. To practice continuous delivery without this safety net requires uncommonly good continuous integration and software testing infrastructure. Or you can retain the safety net by having uncommonly good deployment infrastructure, which uses a dynamic, programmable load-balancing layer and containers to provide quick roll-back and staggered code deploys. I look forward to a future in which these are standard, but for the moment they are quite rare.

How to make RepoAuth easier

It would be ideal to provide some kind of hybrid mode, wherein HHVM starts in interpreter mode, but still performs expensive optimizations that are normally reserved for RepoAuth mode. Once a file is translated, it is not re-read from disk (even if it is touched/modified). But HHVM will respond to a signal, which will prompt it to reinterpret the code, picking up any changes. Kind of like Apache's graceful reload behavior on SIGUSR1.

When I suggested this on #hhvm, some developers weighed in about what it would take to implement this. The challenge appears to be how to gracefully unload an auth-mode repo without restarting, as well as how to gracefully carry over existing profile data.

@fredemmott and @Orvid suggested making this work by adding a mechanism for passing JIT profiling data between HHVM instances. If this were possible, then a reload could be performed by spawning a new HHVM instance, passing profiling data, and then using the takeover functionality. @paulbiss chimed in to say that reusing profile data is already on the roadmap for improving warmup time, and will likely be within reach before the end of the year. The question of how to pass over profiling data without doubling memory requirements came up. Paul suggested that HHVM could treadmill the old code and pointed out that there is already an option for garbage-collecting unused translations.

atdt commented 8 years ago

Badoo published How Badoo saved one million dollars switching to PHP7 today. Among other things, it talks about how they evaluated HHVM and why they decided in favor of PHP7. The excerpt below highlights deployment as a pain-point, echoing much of what I wrote above:

Deploying is difficult and slow. During deploy, you have to warm up the JIT-cache. While the machine is warming up, it shouldn’t be loaded down with production traffic, because everything goes pretty slowly. HHVM team also doesn’t recommend warming up parallel requests. By the way, the warm-up phase of a big cluster operation doesn’t go quickly. Additionally, for big clusters consisting of a few hundred machines, you have to learn how to deploy in batches. Thus the architecture and deploy procedure involved is substantial, and it’s difficult to tell how much time it will take ahead of time. For us, it’s important for deploy to be as simple and fast as possible. Our developer culture prides itself on putting out two planned releases a day and being able to roll out many hot fixes.