Princeton-LSI-ResearchComputing / tracebase

Mouse Metabolite Tracing Data Repository for the Rabinowitz Lab
MIT License
4 stars 1 forks source link

Speed up tests and load multi-tracer/label data in the ci tests #480

Closed hepcat72 closed 11 months ago

hepcat72 commented 2 years ago

FEATURE REQUEST

Inspiration

Tests run very slow and loading all types of example data in the ci yaml script provides a good example to new contributors, despite (or in addition to) the examples in the contrib doc.

Description

Tests for the execution time of tests and test setup should be added to the code added in PR #506:

https://github.com/Princeton-LSI-ResearchComputing/tracebase/blob/ef6db15f72bc6e738e76e6ff7bcbc59d836272a5/DataRepo/tests/tracebase_test_case.py#L24-L43

For the tests that fail:

Also:

Alternatives

none

Dependencies

This issue cannot be started until the completion of the following issue(s):

Comment

The tests should be implemented inside reportRunTime. This comment at the top of that method was removed:

TODO: When issue #480 is implemented, a test should be added here to ensure each test runs in under some reasonable threshold. There might even be a way to ensure the data setup runs quickly as well.

11/15/2023 Note

I identified the following files in example_data that are used in the tests:

$ git grep -A 1 example_data | grep "/tests/" | grep '"' | cut -d '"' -f 2,4 | grep -E '/|\.' | perl -e 'while(<>){s/.*\"//;if(m%/$%){chomp}print}' | sort -u
DataRepo/example_data/AsaelR_13C-Valine+PI3Ki_flank-KPC_2021-12_isocorr_CN-corrected/Serum results_cor.csv
DataRepo/example_data/AsaelR_13C-Valine+PI3Ki_flank-KPC_2021-12_isocorr_CN-corrected/TraceBase Animal and Sample Table Templates_AR.xlsx
DataRepo/example_data/AsaelR_13C-Valine+PI3Ki_flank-KPC_2021-12_isocorr_CN-corrected/loading.yaml
DataRepo/example_data/consolidated_tracebase_compound_list.tsv
DataRepo/example_data/data_submission_good/accucor1.xlsx
DataRepo/example_data/data_submission_good/accucor2.xlsx
DataRepo/example_data/data_submission_good/animal_sample_table.xlsx
DataRepo/example_data/data_submission_sample_unkres_acc_good/accucor1.xlsx
DataRepo/example_data/data_submission_sample_unkres_acc_good/accucor2.xlsx
DataRepo/example_data/data_submission_sample_unkres_acc_good/animal_sample_table.xlsx
DataRepo/example_data/obob_animals_table.tsv
DataRepo/example_data/obob_fasted_ace_glycerol_3hb_citrate_eaa_fa_multiple_tracers/6eaafasted1_cor.xlsx
DataRepo/example_data/obob_fasted_ace_glycerol_3hb_citrate_eaa_fa_multiple_tracers/6eaafasted2_cor.xlsx
DataRepo/example_data/obob_fasted_ace_glycerol_3hb_citrate_eaa_fa_multiple_tracers/animal_sample_table.xlsx
DataRepo/example_data/obob_fasted_ace_glycerol_3hb_citrate_eaa_fa_multiple_tracers/bcaafasted_cor.xlsx
DataRepo/example_data/obob_fasted_ace_glycerol_3hb_citrate_eaa_fa_multiple_tracers/loading.yaml
DataRepo/example_data/obob_fasted_glc_lac_gln_ala_multiple_labels/alafasted_cor.xlsx
DataRepo/example_data/obob_fasted_glc_lac_gln_ala_multiple_labels/animal_sample_table.xlsx
DataRepo/example_data/obob_fasted_glc_lac_gln_ala_multiple_labels/glnfasted1_cor.xlsx
DataRepo/example_data/obob_fasted_glc_lac_gln_ala_multiple_labels/glnfasted2_cor.xlsx
DataRepo/example_data/obob_fasted_glc_lac_gln_ala_multiple_labels/loading.yaml
DataRepo/example_data/obob_maven_6eaas_inf.xlsx
DataRepo/example_data/obob_maven_6eaas_inf_corrected.csv
DataRepo/example_data/obob_maven_6eaas_inf_corrected_invalid_syn.csv
DataRepo/example_data/obob_maven_6eaas_inf_corrected_valid_syn.csv
DataRepo/example_data/obob_maven_6eaas_inf_new_researcher_err.xlsx
DataRepo/example_data/obob_maven_6eaas_inf_new_researcher_err2.xlsx
DataRepo/example_data/obob_maven_6eaas_inf_sample_dupe.xlsx
DataRepo/example_data/obob_maven_6eaas_serum.xlsx
DataRepo/example_data/obob_maven_c160_serum.xlsx
DataRepo/example_data/obob_samples_table.tsv
DataRepo/example_data/protocols/T3_protocol.tsv
DataRepo/example_data/protocols/loading.yaml
DataRepo/example_data/protocols/protocols.tsv
DataRepo/example_data/sample_and_animal_tables_headers.yaml
DataRepo/example_data/sample_table_headers.yaml
DataRepo/example_data/serum_lactate_timecourse_treatment.tsv
DataRepo/example_data/serum_lactate_timecourse_treatment_new_researcher.tsv
DataRepo/example_data/small_dataset/small_obob_animal_and_sample_table.xlsx
DataRepo/example_data/small_dataset/small_obob_compounds.tsv
DataRepo/example_data/small_dataset/small_obob_maven_6eaas_inf.xlsx
DataRepo/example_data/small_dataset/small_obob_maven_6eaas_inf_blank_sample.xlsx
DataRepo/example_data/small_dataset/small_obob_maven_6eaas_inf_dupes.xlsx
DataRepo/example_data/small_dataset/small_obob_maven_6eaas_inf_glucose.xlsx
DataRepo/example_data/small_dataset/small_obob_maven_6eaas_inf_glucose_2.xlsx
DataRepo/example_data/small_dataset/small_obob_maven_6eaas_inf_glucose_conflicting.xlsx
DataRepo/example_data/small_dataset/small_obob_maven_6eaas_inf_lactate.xlsx
DataRepo/example_data/small_dataset/small_obob_maven_6eaas_inf_req_prefix.xlsx
DataRepo/example_data/small_dataset/small_obob_maven_6eaas_serum.xlsx
DataRepo/example_data/small_dataset/small_obob_sample_table.tsv
DataRepo/example_data/small_dataset/small_obob_sample_table_2ndstudy.tsv
DataRepo/example_data/small_dataset/small_obob_sample_table_serum_only.tsv
DataRepo/example_data/small_dataset/small_obob_study_params.yaml
DataRepo/example_data/small_dataset/small_obob_study_prerequisites.yaml
DataRepo/example_data/small_multitracer_data/6eaafasted1_cor.xlsx
DataRepo/example_data/small_multitracer_data/animal_sample_table.xlsx
DataRepo/example_data/small_multitracer_data/bcaafasted_cor.xlsx
DataRepo/example_data/test_dataframes/loading.yaml
DataRepo/example_data/testing_data/accucor_with_multiple_labels/accucor.xlsx
DataRepo/example_data/testing_data/accucor_with_multiple_labels/accucor_bad_label.xlsx
DataRepo/example_data/testing_data/accucor_with_multiple_labels/samples.xlsx
DataRepo/example_data/testing_data/animal_sample_table_labeled_elements.xlsx
DataRepo/example_data/testing_data/animal_sample_table_labeled_elements_invalid.xlsx
DataRepo/example_data/testing_data/protocols/protocols_with_errors.tsv
DataRepo/example_data/testing_data/protocols/protocols_with_workarounds.tsv
DataRepo/example_data/testing_data/short_compound_list.tsv
DataRepo/example_data/testing_data/small_obob_animal_and_sample_table_empty_animalid_in_animalsheet.xlsx
DataRepo/example_data/testing_data/small_obob_animal_and_sample_table_empty_animalid_in_samplesheet.xlsx
DataRepo/example_data/testing_data/small_obob_animal_and_sample_table_empty_animalid_in_samplesheet_silent.xlsx
DataRepo/example_data/testing_data/small_obob_animal_and_sample_table_empty_row.xlsx
DataRepo/example_data/testing_data/small_obob_animal_and_sample_table_missing_rqd_vals.xlsx
DataRepo/example_data/testing_data/test_study_1/test_study_1_compounds_dupes.tsv
DataRepo/example_data/testing_data/tissues/tissues_with_errors.tsv
DataRepo/example_data/tissues/loading.yaml
DataRepo/example_data/tissues/tissues.tsv

And the following files are used in the github test action (as found in the yaml):

DataRepo/example_data/tissues/loading.yaml
DataRepo/example_data/consolidated_tracebase_compound_list.tsv
DataRepo/example_data/obob_sample_table.tsv
DataRepo/example_data/obob_maven_6eaas_inf.xlsx
DataRepo/example_data/obob_maven_6eaas_serum.xlsx
DataRepo/example_data/obob_maven_c160_inf.xlsx
DataRepo/example_data/obob_maven_c160_serum.xlsx

ISSUE OWNER SECTION

Assumptions

Requirements

Limitations

Affected Components

DESIGN

Interface Change description

Describe changes to usage. E.g. GUI/command-line changes

Code Change Description

Describe code changes planned for the feature. (Pseudocode encouraged)

Tests

lparsons commented 2 years ago

I believe this should be split into multiple issues, probably a few for speeding up various test suites (classes), and a few for various tests that are needed. The new tests are required, the faster tests are optional.

hepcat72 commented 2 years ago

Based on your comment, I had inferred there were tests unrelated to running time, but now that I'm editing the issue to split it up, I don't see anything to split up. The development of the smaller data should be driven by the tests that are suggested in the issue. If they were tackled separately, the resulting PR would have broken tests for the test issue.

Perhaps there's a misunderstanding, so perhaps a rewording is needed. What this issue is suggesting is to add tests inside tracebase_test_case.setUp and tracebase_test_case.tearDown that fail if any test that inherits from either TracebaseTestCase or tracebaseTransactionTestCase takes longer than a given time threshold to run or "setup".

It's the results of that testing that I'd suggesting should guide the data data edits.

But I suppose we could split the issues up if the data edit issue is dependent on the test implementation issue, and if broken tags were applied to those tests... but I'm not sure if adding tags to inherited tests would work as intended because the inherited tests would inherit the tags applied to the derived class. (I know this because I ran into this issue before when I tried being clever about test class inheritance.)

hepcat72 commented 2 years ago

IMHO, the best way to implement any feature is to write the tests first, then the actual features. I don't often do that myself, but that's the concept behind how I created this issue.

lparsons commented 2 years ago

Since tests can be run on different hardware and take a variable amount of time, it doesn't seem right for them to fail if they take a long time. It would be helpful, however to generate a report that shows how long each tests (or perhaps test class) took to run. I would be in favor of that being the first thing to tackle. Once we have that information, we can reorganize the slowest tests and hopefully come up with a plan and/or smaller test datasets to make these faster.

The title of this issue also mentions adding multi-label/multi-tracer data to the tests. If that is not already happening, we should probably do that as well since that would help test the loading code, but that should be a separate (and higher priority) issue.

lparsons commented 2 years ago

I now see that tests for multi-label/multi-tracer data are in #526.

hepcat72 commented 2 years ago

Since tests can be run on different hardware and take a variable amount of time, it doesn't seem right for them to fail if they take a long time. It would be helpful, however to generate a report that shows how long each tests (or perhaps test class) took to run. I would be in favor of that being the first thing to tackle. Once we have that information, we can reorganize the slowest tests and hopefully come up with a plan and/or smaller test datasets to make these faster.

That's reasonable. Perhaps a warning instead of a test failure... Alternatively, there could possibly exist a test using the base class that ensures load files used in tests or setUpData don't have greater than some arbitrary row and/or column threshold in a sheet. Just trying to think of the most useful test that would automatically ensure tests run fast.

hepcat72 commented 2 years ago

You know, this issue could be split up by example dataset: for any sufficiently sized example dataset loaded in any test, create a separate issue to create a pared down version that only retains the data used in the tests that load them...

lparsons commented 2 years ago

I think that's a great idea @hepcat72. That would allow us to incrementally speed up the tests and help us get an idea of how many times things are being loaded. Once we have smaller test datasets, we can decide if any reorganization would be worth it. In my opinion, this is something to work on incrementally as time permits.

hepcat72 commented 1 year ago

Preparing to split up this issue. Worked out a command to identify slow tests that could be used to identify large datasets that need reduction:

$ grep "TEST TIME .ALERT . 20." /Users/rleach/Downloads/Web/logs_3590/build/10_Run\ tests.txt | sort -n -k 8
2023-08-23T14:16:03.7760792Z TEST TIME [ALERT > 20]: DataRepo.tests.loading.test_load_accucor_msruns.IsoCorrDataLoadingTests.test_multitracer_isocorr_load_2: 27.382
2023-08-23T14:15:36.3557952Z TEST TIME [ALERT > 20]: DataRepo.tests.loading.test_load_accucor_msruns.IsoCorrDataLoadingTests.test_multitracer_isocorr_load_1: 28.198
2023-08-23T14:23:12.6148574Z TEST TIME [ALERT > 20]: DataRepo.tests.test_models.PropertyTests.test_normalized_labeling_latest_serum: 33.707
2023-08-23T14:22:37.1427089Z TEST TIME [ALERT > 20]: PropertyTests.setUpTestData: 36.321
2023-08-23T14:14:54.8766644Z TEST TIME [ALERT > 20]: DataRepo.tests.loading.test_load_accucor_msruns.IsoCorrDataLoadingTests.test_multilabel_isocorr_load_3: 36.547
2023-08-23T14:14:18.1507564Z TEST TIME [ALERT > 20]: DataRepo.tests.loading.test_load_accucor_msruns.IsoCorrDataLoadingTests.test_multilabel_isocorr_load_2: 36.602
2023-08-23T14:19:04.6189755Z TEST TIME [ALERT > 20]: DataLoadingTests.setUpTestData: 37.479
2023-08-23T14:24:23.3120828Z TEST TIME [ALERT > 20]: DataRepo.tests.test_models.PropertyTests.test_normalized_labeling_latest_serum_no_peakgroup: 51.279
2023-08-23T14:21:44.0619876Z TEST TIME [ALERT > 20]: MultiTracerLabelPropertyTests.setUpTestData: 76.874
2023-08-23T14:13:41.5228334Z TEST TIME [ALERT > 20]: DataRepo.tests.loading.test_load_accucor_msruns.IsoCorrDataLoadingTests.test_multilabel_isocorr_load_1: 77.606
2023-08-23T14:12:23.6530909Z TEST TIME [ALERT > 20]: DataRepo.tests.loading.test_load_accucor_msruns.IsoCorrDataLoadingTests.test_labeled_elements_common_with_compound: 80.033
2023-08-23T14:17:30.8467359Z TEST TIME [ALERT > 20]: DataRepo.tests.loading.test_load_accucor_msruns.IsoCorrDataLoadingTests.test_multitracer_isocorr_load_3: 86.732
2023-08-23T14:31:22.1285420Z TEST TIME [ALERT > 20]: DataRepo.tests.test_models.StudyLoadingTests.test_singly_labeled_isocorr_study: 86.816
2023-08-23T14:29:54.4381075Z TEST TIME [ALERT > 20]: DataRepo.tests.test_models.StudyLoadingTests.test_multi_tracer_isocorr_study: 132.489
2023-08-23T14:27:41.5820646Z TEST TIME [ALERT > 20]: DataRepo.tests.test_models.StudyLoadingTests.test_multi_label_isocorr_study: 144.344
hepcat72 commented 1 year ago

Unique Datasets

lparsons commented 11 months ago

Regarding the refactoring/speeding up of tests, I would suggest working on only one test class at a time. Make smaller copies of the test data that gets placed into a new directory. The new directory will contain only test data that has been "reduced". Put a tag on any test classes that have been refactored to use the reduced test data. That will ensure that each PR is relatively small and contained and not break any other tests. Once we're done, we can cleanup/remove the unused old test data and create a directory with a few "good" example datasets that a developer or new user can use to bootstrap and try out a new install. Just my two cents...

hepcat72 commented 11 months ago

I'm doing something similar, but I'm doing it by dataset, not by test. Many tests use the same dataset. You could create multiple copies of a dataset and do it by test, but there are a few downsides to that. In terms of breaking it up, I'm doing a few passes that involve reorganizing the data to split example data from test data. That reorganization involves what looks like more changes, but it's just a moving of files. I could have done better at keeping reductions separate from re-orgs, but I take your point. Perhaps I'll create a separate PR at an interim commit in the one I have in the works now. The tests won't pass, but it will lighten the PRs.

lparsons commented 11 months ago

I'm doing something similar, but I'm doing it by dataset, not by test. Many tests use the same dataset. You could create multiple copies of a dataset and do it by test, but there are a few downsides to that. In terms of breaking it up, I'm doing a few passes that involve reorganizing the data to split example data from test data. That reorganization involves what looks like more changes, but it's just a moving of files. I could have done better at keeping reductions separate from re-orgs, but I take your point. Perhaps I'll create a separate PR at an interim commit in the one I have in the works now. The tests won't pass, but it will lighten the PRs.

I still think it would be less disruptive to reorganize one test (test class) at a time, each with a PR, making sure all tests pass with each PR. The problem here is that it will be a long time (at least a lot of work) before all tests are passing again and it will be very likely to cause a conflict with other PRs. If it's too late for that approach, then just go ahead with this one, but please try to get to something we can merge into main ASAP to minimize conflicts with other development efforts.

hepcat72 commented 11 months ago

Many of these datasets are used in multiple test classes and each class uses different data from the dataset to perform tests. Personally, I think that the way I'm doing it is less work. After test_reduction_surgery3 (which I'd originally intended to be 2), I was planning on smaller changes. The re-org was the disruptive part.

lparsons commented 11 months ago

Many of these datasets are used in multiple test classes and each class uses different data from the dataset to perform tests. Personally, I think that the way I'm doing it is less work. After test_reduction_surgery3 (which I'd originally intended to be 2), I was planning on smaller changes. The re-org was the disruptive part.

I guess I'm not following, but as long as we get to all tests passing and can merge into main without creating too many conflicts with other PRs that will be good. Looking forward to the cleaner test data and, more importantly, faster tests, though I suspect that will take some time to get there.