Robust Testing of the MVP

RebelOfDeath commented 4 months ago

testing_meme1

As part of #10 I would like to propose to do some rather robust testing before we move on to the next milestone i.e. developing the actual plugins. For this, we still need to test quite a bit of logic in order to ensure the quality of the server and the soundness of the approach taken in terms the decisions made. Below I have included a checklist of the test suites we still have to write.

The Actual endpoints

[x] Ensure that given valid inputs, the proper methods are called internally and with the right arguments; additionally, we should ensure that given the preconditions are met, the output is always valid (i.e. postconditions are also met)
[x] Ensure that the exception handling is robust and given that methods that can fail do fail, the application does not halt execution
[x] Ensure that preventive measures taken in terms of excessive session generation are followed
[x] Ensure that the logic to combat malicious intent does work as intended
[x] Ensure that WebSocket connections work as intended
[x] Ensure that the caching logic works as intended

The Conversion Interface from Application to DB

[x] Ensure that given different configurations of user storage preferences, the methods act as intended
[x] Ensure that the methods are implemented in a fail-safe manner and do not cause the propagation of exceptions to the higher levels.

The Sessions and Session Managers and the Clean-Up Logic

[x] Ensure that the cleanup logic for the session manager works as intended
[x] Ensure that upon the creation and removal of new sessions to the session manager the appropriate methods are called with the correct inputs
[x] Ensure that sending an update (as part of a VerifyRequest) causes the right updates in a session

The LLM Side of things

TBD

Ar4l commented 4 months ago

I would argue we can do integration/end-to-end tests for the endpoints, which should also test the LLMs. I've already added a test_generation method that does this for one sample; but ideally we stress-test with a batch of generations as well.

RebelOfDeath commented 4 months ago

Test the logic for the sessions in 233f189b07466bc8441007244dc338268255e2ca along with improvements bound to the feedback received from the testing phase.

RebelOfDeath commented 4 months ago

achieved 93% line coverage as of 5a9c7163ec80048cc381f88f155fd2055acc4026 in main.py

AISE-TUDelft / coco