falldamagestudio / UE4-GHA-BuildSystem

Build UE4 games with GitHub + GitHub Actions + Google Cloud
MIT License
13 stars 8 forks source link

Failure to create build instance #9

Closed PetterGs closed 3 years ago

PetterGs commented 3 years ago

First of all, awesome work, All small teams should use this, shame this isn't more widely known.

I tried to use this repo but the Github actions failed at the deploy stage. The VM image is built but I'm getting a bizarre error that I'm exceeding my GCP quota even though only a single instance should be created, if I understand correctly. I'm a bit of a newb when it comes to Terraform which makes debugging this error a challenge.

Error: Error waiting for instance to create: Quota 'CPUS' exceeded.  Limit: 29.0 in region europe-west3.

  on ../../../../submodules/UE4-BuildServices/services/builders/ue4_build_agent/main.tf line 1, in resource "google_compute_instance" "default":
   1: resource "google_compute_instance" "default" ***

Error: Error waiting for instance to create: Quota 'CPUS_ALL_REGIONS' exceeded.  Limit: 32.0 globally.

  on ../../../../submodules/UE4-BuildServices/services/builders/ue4_build_agent/main.tf line 1, in resource "google_compute_instance" "default":
   1: resource "google_compute_instance" "default" ***

Error: Terraform exited with code 1.
Error: Process completed with exit code 1.
Kalmalyzer commented 3 years ago

It looks like you are running into resource quota limits.

The quota is a blunt tool that Google uses to 1) avoid people accidentally starting up lots of machines and thereby accruing high bills and 2) avoid people deliberately starting up so many machines that the particular zone's datacenter runs out of free resources, and other customers aren't able to start up new machines.

The CPU quota is measured in CPUs - not machines, but number of CPUs. It looks like you have a max quota of 29 CPUs in europe-west3 and 32 CPUs globally. The example Terraform script attempts to create a 32-CPU engine builder and a 32-CPU game builder. Creation of the first of these fails.

I'm not sure why your quotas are that low - for me, the default quotas are at 72 for most regions (including europe-west3). You can view your quotas within Google's Cloud Console. You're looking for the 'CPUS' quota specifically.

To get around this, either reduce the number of CPUs required by each of your instances (I suggest: n1-standard-16 for your engine builder, n1-standard-8 for your game builder), or request a quota increase to ~100 CPUs, both for the region and globally. That is a manual process but so far I have seen those get accepted within 24h.

PetterGs commented 3 years ago

Thank you very much. I wasn't aware that the terraform requested that many CPUs, my bad.

Kalmalyzer commented 3 years ago

(Please let me know how it goes!)

PetterGs commented 3 years ago

My engine build fails with the following error message:

Error: Input 'submodules' not supported when falling back to download using the GitHub REST API. To create a local Git repository instead, add Git 2.18 or higher to the PATH.

Is the version of git too old on the building VM?

Kalmalyzer commented 3 years ago

Several things have gone wrong when building the VM image. Git didn't install properly during VM image creation. Later on, when you attempt to use the image, you get an error message.

Here's the error when I re-run the VM image build step:

image

Several things combine to make this difficult to diagnose:

  1. The VM image build script attempts to automatically locate the download URL for the latest version of Git. That logic assumes that binaries in the release are named according to a specific pattern. That naming pattern has changed sometime between Sep 2020 and now. We should lock the Git version (#10).
  2. The step that attempts to fetch the Git installer fails since the generated URL is invalid.
  3. The Packer VM image build process does not fail when there's an error within one of the install scripts. We should ensure that the Packer build process fails whenever there are any unhandled errors in the Powershell scripts (#11).

I'll get back to you once these are resolved.

Kalmalyzer commented 3 years ago

Ok, I have fixed the problems that you run into, plus some other things:

There's a summary on quota problems in case others hit it in the future: https://github.com/falldamagestudio/UE4-GHA-BuildSystem/blob/master/COMMON_PROBLEMS.md#hitting-resource-quota-limits-in-google-cloud-platform

What you need to do, to try again, is to replicate the new changes to the master branches in UE4-GHA-BuildSystem, UE4-GHA-Engine and UE4-GHA-Game to your forks/imports. If you have made local changes to the submodules (UE4-GHA-BuildServices, UE4-GHA-BuildAgent, UE4-GHA-BuildAgentWatchdog) then you need to replicate those changes and update your UE4-GHA-BuildSystem accordingly. Then ensure the infrastructure setup process is triggered again (either push changes to master in UE4-GHA-BuildSystem, or manually click "Run workflow" for the "Create/update infrastructure" workflow).

Let me know how this goes.

PetterGs commented 3 years ago

I got the engine built beautifully. I'm going to test building a game next.

I am working in the AEC sector and was thinking about the following architecture concerning Github actions and Unreal. All c++ code should be in plugins that the main game repo uses, while blueprints are used by artists and designers.

This would borrow the microservices model to essentially separate concerns fully, in order to reduce code "spaghettification" and crunch.

The main "game" should use Plastic SCM. But all plugins should use git. Every plugin has unit, functional, and acceptance testing. The client for each plugin are the artists, and they only work with the compiled binaries of the plugins. What do you think of this workflow?

Kalmalyzer commented 3 years ago

The plugin approach is neat on paper. I think you will find that it is reasonable to move some - but not all - C++ logic into plugins.

The part that becomes hairy is when plugins develop co-dependencies. The "interactive simulation" aspect tend to result in interactions between core logic / character movement / collisions / physics simulation / animation / UI in ways that make the body of logic difficult to isolate into self-contained plugins. You begin to invent case-specific APIs for the plugins to enable the detailed interactions necessary for 'richness' in the simulation.

If you can manage, more power to you! But -- if you haven't done so already -- I suggest that you build a working application without plugins first.

Another aspect that is difficult is test automation. It is difficult to create meaningful tests for interactive 3D simulations. Unreal's test framework is not great either. This is not to say that you shouldn't create automated tests -- on the contrary, some form of automated testing is crucial for large-scale Unreal projects -- but, Unreal as an application framework is not designed with your workflow in mind, so I would again suggest that, if you haven't done so already, first build something of value with Unreal and then look at how to do automated testing. Otherwise you may find that the time-to-PoC will be long.

A third problem is that you intend to use Plastic SCM for assets. On one hand, that makes lots of sense - we use Plastic SCM for all our game C++ code & assets because of better scalability and a strong UI client - but on the other hand, you will need a different CI platform than GitHub Actions for building the complete application. (Or perhaps you can solve that with a funky solution within GHA, a workflow that is triggered via a repository_dispatch API event and which pulls assets from Plastic?) We are moving over to a Jenkins-based solution, but A) it's not ready for prime time, B) once it is, it will be an order of magnitude more complex than this UE4-GHA-BuildSystem setup.