Open jrmccluskey opened 1 week ago
Not sure what happened on the kokoro test run, the test target passed but the test execution as a whole was killed right after the test passed
Not sure what's happening here, test passes but the test session gets killed consistently. @engelke can you take a look?
@kweinmeister Still need an approving review from someone if you could take a look
If the complaint is with the model handler code I don't think it's too much of a change to cut that code in favor of linking to the source instead.
Debugging the tests: the output shows it's a timeout, but the tests are successful(?)
collecting ... collected 1 item
e2e_test.py::test_pipeline_dataflow PASSED [100%]
-- generated xml file: /workspace/dataflow/gemma-flex-template/sponge_log.xml --
======================== 1 passed in 3642.73s (1:00:42) ========================
nox > Session py-3.10 was successful.
err: signal: killed
The kokoro config is set to a max of 60 min (config). And you've configured the test to have a 5400s (2 hour) timeout.
A similar issue reported in https://github.com/GoogleCloudPlatform/python-docs-samples/issues/4609.
At a guess, while the image is created each time (~20 mins) and it takes time for the job to start (~20 mins), the success message isn't being received, and this the system has 20 minutes of wait before it times out.
How long is this entire e2e test expected to be, and is the 2h wait there intentionally? Something else will need to be updated for that decorator to be respected.
As far as the E2E test timeout, in early testing we were dancing around the hour-mark as far as runs (some slightly under, some slightly over) so it definitely needs to be over an hour. Building the container + running the job as an invocation from a flex template takes substantial time, so we may need a little longer on the kokoro timeout
Building the container + running the job as an invocation from a flex template takes substantial time.
You can significantly bring this down by not including the model and the GPU software into the flex template image. This is a scenario where having two separate images, one for the flex template launcher and one for the custom container image would be better. Care should be taken to build the images with the same set of dependencies, which can be accomplished with requirements files and/or constraint files.
Also, we can speed up launch time by download the model into the SDK worker container from GCS during container startup, instead of shipping it inside the contanier. Currently, this could be done by using a custom entrypoint like https://github.com/liferoad/beamllm/blob/main/containers/ollama/entrypoint.sh, eventually we will have a Beam API for that.
including the model in the container will be less prone to a runtime error, but slower in short-term future.
Description
Adds a Gemma Flex Template example and an e2e test running on Dataflow. This code example is similar to #11284, but using a Pytorch model and deploying as a flex template. The e2e test will need model weights staged to GCS like the streaming Gemma example.
Note: Before submitting a pull request, please open an issue for discussion if you are not associated with Google.
Checklist
nox -s py-3.9
(see Test Environment Setup)nox -s lint
(see Test Environment Setup)