Test Scope - Githubissues

straussmaximilian commented 4 years ago

To have a maintainable package, automated tests and performance benchmarks are crucial. I have the following tests in mind, considering a versioning scheme where we are using (X-Y-Z): X-Major, Y-Minor, Z-Patch. In terms of branching we would have a master/dev and feature branches.

Unit Tests:

Simple function tests within nbdev. They should be run for every push on every branch. Duration ~ minutes

Workflow test:

Tests to run a full pipeline (i.e., perform a search on HeLa Thermo / Bruker data). We could run them for every version, even minor on dev). Duration <1h We would auto-create a settings template for the current version and replace the file_path with the respective filenames)

Integration test:

Test to try all possible (or at least most) combinations of settings. This is something we could do for every Minor version. Duration will be several hours.

Installer Test:

I think shipping is very crucial, and we should have one-click installers ready for each patch. To compile an installer takes approx. <10 minutes so this could be done for every push on the dev branch.

UI Test:

This is a very difficult test but very important to keep a userbase. Implementing a new feature and then the GUI doesn't work anymore. The current settings scheme is very flexible so that the core functionality should be tested with the Workflow test. A proper GUI test would probably include using tools like pyautogui that automatically "clicks" through workflows. If we want to be fancy, we could also use this to make a screenshoted documentation for each version automatically. Ideally I think this would be for every push on the dev branch.

Performance test:

The workflow test from above will not allow us to give a good estimate on performance. We will get execution time and proteins and peptides, but we should also consider metrics like quantification accuracy. For this, we should use multi-species samples with known mixing ratios that are computationally more demanding, and I would hence consider as a different kind of test. The idea would be to have a set of PRIDE datasets like PXD010012, which we always re-run. As we could the analysis results from the repository, we would also have a baseline to compare the results to. Depending on the number of datasets this could take This is something we could potentially do for every minor version.

List of potential performance test sets:

[ ] PXD010012: Online PASEF Paper
[ ] PXD006109: BoxCar

Implementation

For running those tests, I will use GitHub-Actions self-hosted runners. This would allow us to use powerful workstations to run the tests.

Ideally, we can also set up runners for each Windows / Linux and Mac.

At some point, one could also make the testing results more explorable, i.e., pushing the results to a db and having a little dashboard app that shows performance over version/time.

Also, note that we can always trigger the tests manually.

Let me know if you would suggest additional or think the current test set should be optimized.

Styleguide

We should also add an automatic style test.

straussmaximilian commented 4 years ago

For the Workflow test, we should set up something that runs on Linux and Mac OS X. A nice solution for Windows would be to run these tests on self-hosted cloud runners. One way to achieve this would be to use an Auto Scaling Group on AWS: https://dev.to/jimmydqv/github-self-hosted-runners-on-aws-part-2-ec2-3jhj

This however would not work for Mac Os X - but maybe we could use something like this: https://www.macstadium.com

swillems commented 4 years ago

Is it possible top share the test files? This allows a local test to avoid spamming the runners too much by checking if a commit actually passes the tests...

straussmaximilian commented 4 years ago

I just uploaded the current test files here: https://datashare.biochem.mpg.de/s/sYaphoaccYDTnez

swillems commented 4 years ago

Thanks! Can you upload the settings and fasta files as well so I can fully reproduce?

straussmaximilian commented 4 years ago

Thanks! Can you upload the settings and fasta files as well so I can fully reproduce? @swillems I uploaded the FASTA, contaminants to the folder and the settings of the test runner.

swillems commented 3 years ago

Great! They probably could/should be deleted from the source folder then? Also a minor side note/question: Why is the (old) version number of alphapept in the settings file? This seems more logical in a log file if you ask me...

straussmaximilian commented 3 years ago

Great! They probably could/should be deleted from the source folder then?

On the long run I would probably make a dedicated folder and upload them so that people can also use this to test things.

Also a minor side note/question: Why is the (old) version number of alphapept in the settings file? This seems more logical in a log file if you ask me...

It should certainly be in a log file but I also think it is useful within a settings file. The idea would be if at a later stage we rename some settings we could enable backwards compatibility to older settings-files..

swillems commented 3 years ago

Both make sense to me, thanks for explaining

straussmaximilian commented 3 years ago

Github Actions now push the run results to a DB so that we can visualize the run results. https://charts.mongodb.com/charts-alphapept-itfxv/public/dashboards/5f671dcf-bcd6-4d90-8494-8c7f724b727b

Let me know if you can think of any metrics we should display.

swillems commented 3 years ago

By now we also have 5 thermo IRT runs, of which one could/should be included. Probably good to share them on the drive as well and include the irt.fasta and .yaml files. Come to think of it, we might actually opt to use these sample/test files directly with github for the sake of consistency and transparancy. This might require a download which is rather redundant, so perhaps a check to see if the file-for-download is updated might be beneficial.
I would also track the number of peptide IDs and not only the proteins. Number of PSMs is relevant as well.
We should split the timing for individual steps (db creation, FF, search, score, LFQ...). Perhaps we should also look into implementing the CLI commands properly, since we (or at least I) only use the workflow command and the others are mostly stubs. This way we don't need to write to much "custom" code purely for testing purposes, bu can recycle code that is relevant for end-users. This obviously requires good control of intermediary results, I will look into that some more.
We have a bumpversion, but I think that we have been at 0.2.8-dev for a while now, even with multiple commits in between. We should update our versions more frequently (automatically as a git action when pushing into the dev branch perhaps?)
I see you also track file sizes in the test_ci.py, but I am not directly sure how to access this in the charts?

swillems commented 3 years ago

Not sure if it is worth considering. But in relation to test files, consistency, transparency and reproducibility we could consider a git LFS bundle. Irt test files and others can even be included with a free account.

swillems commented 3 years ago

I just found out why your Bruker IRT testing takes 4min and a local test is <2min: you use a human db, take a FASTA of iRT peptides from Biognosys instead

MannLabs / alphapept

Test Scope #40

Unit Tests:

Workflow test:

Integration test:

Installer Test:

UI Test:

Performance test:

List of potential performance test sets:

Implementation

Styleguide