cabaletta / baritone

google maps for block game
GNU Lesser General Public License v3.0
7.78k stars 1.59k forks source link

Automated tests #4716

Open ZacSharp opened 3 weeks ago

ZacSharp commented 3 weeks ago

Describe your suggestion

Any feature should come with at least one automated test, which makes sure it is not obviously broken. (i.e. I want to revive #34)

The general idea is to do something similar to GameTest but on the client / integrated server. A test would start by pasting its associated schematic, then control Baritone solely via chat commands and observe the results.

The main design choices I see are:

Unless there are comments changing that plan, I will implement a somewhat bare bones version of this. That is, a new source set depending on just minecraft with a bunch of classes with a run method to be called on a separate thread, using builtin SNBT for structures and with glue code in the launch source set.

I will also add a few fabric-only tests which require test infrastructure I cannot easily provide or which aren't worth the effort. There will be a test for chat buttons in this category (nothing else requires mouse or keyboard input) and potentially rendering tests (because comparing screenshots might be tricky). However fabric-client-gametest does not exist on 1.19.4 so this cannot be done together with the other tests.

Context

At least from what I can see, testing for Baritone generally is really poor. There are some lonely unit tests, which cannot even use anything requiring Minecraft/Baritone to be running, and that's it. Anything beyond that must be tested manually and there isn't even a list of things to look for. When I test pull requests I generally only test the new features plus really basic functionality and I assume the situation for porting to newer versions is similar (@rfresh2?), which means even obvious regressions in niche features will not be caught. (Also repeating the same test for the bazillionth time because I'm merging the same branches once again really does get tiring)

Feature / Todo checklist

Final checklist

ZacSharp commented 2 weeks ago

Using a second thread and a primitive T runOnMain(Supplier<T>) method should still allow most of the benefits of having a fully synchronized test thread but with much simpler synchronization. (and without surprises when something checks the current thread like the BlockStateInterface constructor)

rfresh2 commented 1 week ago

there is included game test api at least in mc 1.21.x

i know fabric and neoforge have nice integrations for them nowadays. see: https://github.com/FabricMC/fabric/blob/1.21.5/fabric-client-gametest-api-v1/src/client/java/net/fabricmc/fabric/api/client/gametest/v1/package-info.java

i personally use https://github.com/headlesshq/mc-runtime-test for basic mixin smoke testing in https://github.com/rfresh2/XaeroPlus

i do remember having issues getting the gametest api thingy to work correctly on every mc version and loader but its not required for my current uses so i don't use it

i set an env variable that i read in my initializer where after i can run arbitrary code: https://github.com/rfresh2/XaeroPlus/blob/683f05dba0f485d9d3f260bac9b3fb58f1f7812e/fabric/src/main/java/xaeroplus/fabric/XaeroPlusFabric.java#L53-L54

rfresh2 commented 1 week ago

in general I would say rendering tests are near impossible to do. Its not clear how to define and retrieve a passing or failing state.

i have seen there is a screenshot function in the fabric gametest api that can at least help out by uploading an image for a human to look at.

pathfinding tests like going from point A to B would probably be doable though. or things where you can assert that a block was broken or placed at a particular position.

ZacSharp commented 1 week ago

Didn't know the built-in test framework was usable for client tests (or is that just fabric specific?). It was introduced some time ago already, when they turned off unused method removal in some 1.17 release, so we can use it on 1.19.4. Using an existing framework is of course good, but I personally don't particularly fancy using the built-in one directly. EDIT: This was a misunderstanding. GameTest is server-only and cannot test the client.

Anyway, you suggested it, I'll try to have a closer look at it. Maybe we can use a thin wrapper or something like that.

I dismissed running in CI because I expected tests to just take too long for that, but that mixin test is cool. Should definitely do that.

My plan was to eventually implement test groups so for rendering (and similar) there would be a group of half-automated tests which do the setup and then ask a human to mark success or failure. Running them anyway in unsupervised runs and saving screenshots is a good idea though. Rendering might even be testable by taking screenshots against animation-free background with fixed lighting.

rfresh2 commented 1 week ago

i dont know the answers to all your questions

but i would not expect the included gametest api to be stable through all mc versions and loaders if that's what you were hoping for

tests need to run on all platforms

you should consider whether this is really a requirement or a nice-to-have. In my opinion its a nice-to-have.

loosening this requirement will make things much simpler to get started and maintain

in the vast majority of tests you'd do there really isn't a difference between mod behavior on different loaders.

ZacSharp commented 1 week ago

Having had a look at GameTest, it is server-only and not usable for us.


The fabric api module you linked above is a completely independent implementation of a client side test framework and the most interesting features it provides in my opinion is the screenshot comparison and tick sync between test and client. It also provides input simulation (interesting for chat buttons, but that's about it) and setting up worlds/servers (on the flip side, it doesn't support not managing worlds for us. No way to run two tests without rejoining the world). However it does not allow starting tests manually or individually and completely disconnects normal input handling, so I don't think we can have semi-automatic tests with it, retry a test individually, or run against an already existing server. Another notable feature I could not find is loading structures. Sinytra also doesn't seem to support it, so we don't get NeoForge support for free.

I'm unsure about this one. It is usable and provides useful features, but then again, it also looks like it could end up being in the way a lot (leaves the world after the test finished, seemingly unable to run individual tests). At least the screenshot stuff should be easily portable and I'm willing to sacrifice the heavy synchronization for platform support.


Considering time to get started: I already have a test runner so I could spend two more hours writing a dozen helper methods and implement test groups / running individual tests and then just write tests.


While running on all platforms doesn't have to be a requirement I do think it is more than nice-to-have and at least two platforms should be supported. One motivation for this issue is specifically reducing the testing gap between platforms, to avoid things like "elytra doesn't work at all on forge" going unnoticed again. (so far I always test on the tweaker dev run and the rest gets a quick look at best) I'd also not worry if it's just one platform for now with a clear path to support more in the future, but with a fapi module that's not the case.


The stability part isn't about being stable in all directions forever but rather about not having large changes in testing coincide with large changes in the thing being tested.

This is a working test I already have (the only one) ```java package baritone.test.tests; import baritone.test.Test; import net.minecraft.client.Minecraft; import net.minecraft.core.BlockPos; import java.util.function.Predicate; public class PillarTest extends Test { protected void run() { BlockPos origin = evalOnMain(mc -> mc.player.blockPosition()); send("/give @s minecraft:stone 64"); wait(2); send("#goto ~ ~10 ~"); for (int i = 0; i <= 10; ++i) { BlockPos pos = origin.above(i); waitFor(100, mc -> mc.player.blockPosition().equals(pos)); } wait(10); send("/fill ~ ~-10 ~ ~ ~-1 ~ minecraft:air"); send("/tp @s ~ ~-10 ~"); send("/clear @s stone"); send("#cancel"); wait(2); } private void send(String message) { runOnMain(mc -> { if (message.startsWith("/")) { mc.player.connection.sendCommand(message.substring(1)); } else { mc.player.connection.sendChat(message); } }); } private void wait(int ticks) { try { Thread.sleep(50 * ticks); } catch (InterruptedException e) {} } private void waitFor(int timeout, Predicate condition) { for (int i = 0; i < timeout; ++i) { wait(1); if (evalOnMain(mc -> condition.test(mc))) { return; } } throw new IllegalStateException(); } } ``` The generic parameter is a leftover and fundamental helpers would also be moved somewhere else.
ZacSharp commented 1 week ago

I think running against arbitrary servers isn't as important as I thought though. There might be some use to it to e.g. check whether some server software messes Baritone up or to half-automate local test during development, but especially the latter is far from being a top concern.

leijurv commented 1 week ago

We used to have this, see https://github.com/cabaletta/baritone/blob/66ffd1e0d9acb68b5b3b13a112522cd88b6d76db/src/main/java/baritone/utils/BaritoneAutoTest.java

It would create a world of a set seed and path to a certain destination, and it had to arrive within a certain time limit or else CI would fail.

ZacSharp commented 1 week ago

That's not quite the kind of test I'm aiming for. Basically, it is one big blind shot, while I want lots of small tests targeted at mostly one feature. (e.g. "can place into water sources with allowPlaceInFluidsFlow disabled")

Also I think I've settled on not using fabric-client-gametest. It provides some nice to have features (setting up dedicated servers, simulated inputs (actually very-nice-to-have), complete client-server synchronization, additional screenshot utilities, resetting game settings, some small nits for consistentcy / ease of use), but inherently restricts us to test on just one of our supported platforms. It also blocks simple things like manually inspecting a failed test (inputs are detached and the client just quits), selectively running tests (have to delete all other tests from the json) and its "one world, one test" design basically forces us to create sub-tests, because generating a world each for tests taking just a dozen seconds is too much. The new Minecraft dependencies needed to run our own thingy would be creating / joining new worlds, quitting Minecraft, sending commands and likely also taking screenshots and the SNBT parser.

@rfresh2 since you might be the person who will port my mess to future mc versions and expressed concerns about maintainability: Is writing our own test scaffolding fine for you or do you want me to go for fabric-client-gametest anyway?

rfresh2 commented 1 week ago

the fewer hooks into / dependencies in mojang code the better for version portability. the rate of churn they put out nowadays keeps accelerating.

which may be at odds with features needed/wanted in the test framework.

idk kinda hard to say how i will feel about the future "mess" but if you feel motivated to give it a shot go for it.

who knows, maybe it'll actually be useful for other multi-loader mod devs and you can split it out as its own library