beyond-all-reason / teiserver

Middleware server for online gaming
https://www.beyondallreason.info/
MIT License
55 stars 53 forks source link

More fix spring auth #277

Closed geekingfrog closed 4 months ago

geekingfrog commented 5 months ago

A few fixes for some old tests. mix test test/teiserver/protocols/spring/spring_auth_test.exs now returns 16 tests passed out of 16 vs 7 failures out of 16 tests on master.

Although there are still some async errors regarding tests teardown, it doesn't actually fail any tests. More info in af604750fedf11730dc9f52a53db84f3f29b9f79

geekingfrog commented 5 months ago

@L-e-x-o-n if you have some time to have a look.

L-e-x-o-n commented 5 months ago

@L-e-x-o-n if you have some time to have a look.

I am getting 16 tests, 4 failures. Starting with

2024-04-22 17:16:09.818 [error] Postgrex.Protocol (#PID<0.529.0>) disconnected: ** (DBConnection.ConnectionError) owner #PID<0.2214.0> exited

Client #PID<0.2216.0> is still using a connection from owner at location:

but there is probably something broken with my local build because GH action test workflow only gets 1 auth test error:

132) test RENAMEACCOUNT (Teiserver.SpringAuthTest)
Error:      test/teiserver/protocols/spring/spring_auth_test.exs:450
     Assertion with == failed
     code:  assert reply == "SERVERMSG Invalid characters in name (only a-z, A-Z, 0-9, [, ] allowed)\n"
     left:  "SERVERMSG Invalid characters in name (only a-z, A-Z, 0-9, [, ] allowed)\nSAIDPRIVATE Coordinator Invalid characters in name (only a-z, A-Z, 0-9, [, ] allowed)\n"
     right: "SERVERMSG Invalid characters in name (only a-z, A-Z, 0-9, [, ] allowed)\n"
     stacktrace:
       test/teiserver/protocols/spring/spring_auth_test.exs:467: (test)
geekingfrog commented 5 months ago

I am getting 16 tests, 4 failures. Starting with

Have you tried dropping the test db? psql -U teiserver_test postgres -c "drop database teiserver_test" Because some other tests aren't correctly setup with the sql sandbox, so they are leaving some state in place. I saw for example that some tests were setting up some "new" user by name, but since the name was already existing, the id expected in the test and what the server was using was different.

There may be some more flakiness around though, as shown from the github action, but I'd rather focus on running tests for just this file for now, or at least for the "fixed" files.

We should instead ensure teiserver boots and shuts down on each test that requires connections to teiserver.

That seems very excessive, and not the default for phoenix tests. Ensuring disconnection is really to prevent stdout being inundated by errors irrelevant to the tests themselves.

AdamChlupacek commented 5 months ago

I agree that shutting the whole server down is excessive, however those error logs are prove that there is some code still executing after the test finishes. I would just like to have some way how to sand box the whole test so know nothing else is running, i guess this is a topic on its own.

All the tests do pass fine on my machine otherwise.

geekingfrog commented 5 months ago

I agree that shutting the whole server down is excessive, however those error logs are prove that there is some code still executing after the test finishes. I would just like to have some way how to sand box the whole test so know nothing else is running, i guess this is a topic on its own.

Yeah, the teardown code is running after the tests because there are a few on_exit(fn) hooks. The entire test suite goes something like that:

start server
setup test1 (may be many of these as these can be nested)
run test1
teardown test1

setup test2
run test2
teardown test2

...
shutdown erlang VM

When starting, teiserver does a bunch of DB query, I think it seeds the DB with some data if not there, and warm a bunch of caches as well. I suspect this may be problematic with many tests indeed. I do not yet 100% understand how the sandbox work, and how the connection sharing play with async/sync tests. It would be nice to get to the bottom of it definitely.

AdamChlupacek commented 5 months ago

When starting, teiserver does a bunch of DB query, I think it seeds the DB with some data if not there, and warm a bunch of caches as well. I suspect this may be problematic with many tests indeed. I do not yet 100% understand how the sandbox work, and how the connection sharing play with async/sync tests. It would be nice to get to the bottom of it definitely.

As far as i understand the sandboxing in terms of ecto, it just records the transactions made agains the DB and then reverses them. I did not find any other layer of sandboxing.

I think what is happening is that we completely escape the context of the test via the TCP connections that are made to the running teiserver. Since most of these test dont only test how the messages are handled but as well the L4 TCP. These test are on level of rancher so no phoenix is include here afaik.

Overall i think we should either remove the on_exit here since its not really contributing to the test, and the core issue of the extraneous error logs should be handled in a different PR, be it by including the on_exit as part of of auth_setup or changing how we test the functionality not to escape the context via TCP.

L-e-x-o-n commented 5 months ago

I am getting 16 tests, 4 failures. Starting with

Have you tried dropping the test db? psql -U teiserver_test postgres -c "drop database teiserver_test" Because some other tests aren't correctly setup with the sql sandbox, so they are leaving some state in place. I saw for example that some tests were setting up some "new" user by name, but since the name was already existing, the id expected in the test and what the server was using was different.

There may be some more flakiness around though, as shown from the github action, but I'd rather focus on running tests for just this file for now, or at least for the "fixed" files.

We should instead ensure teiserver boots and shuts down on each test that requires connections to teiserver.

That seems very excessive, and not the default for phoenix tests. Ensuring disconnection is really to prevent stdout being inundated by errors irrelevant to the tests themselves.

Good point, used MIX_ENV=test mix ecto.reset instead and it worked. All 16 test pass now :)

geekingfrog commented 5 months ago

Overall i think we should either remove the on_exit here since its not really contributing to the test, and the core issue of the extraneous error logs should be handled in a different PR, be it by including the on_exit as part of of auth_setup or changing how we test the functionality not to escape the context via TCP.

I amended the commit 627af28526d07538f7b1a34a5f4026f4cff2dbe0 Turn out indeed, I can simply put the disconnect inside an on_exit callback in the setup function. This way the test code doesn't need to care about that and it's cleaner.

geekingfrog commented 5 months ago

Is there anything anymore to this PR? I'd like to get it merged soon-ish if possible.