Create additional tests to METdataio to increase code coverage

bikegeek commented 1 month ago

To be assigned to John Sharples once he has accepted the invitation to join this repository.

Describe the Task

METdataio, specifically the METdbLoad modules require additional tests to increase the code coverage from its current status. Add appropriate tests, with particular focus on database loading.

Time Estimate

Estimate the amount of work required here. Issues should represent approximately 1 to 3 days of work.

Sub-Issues

Consider breaking the task down into sub-issues.

[ ] Add a checkbox for each sub-issue here.

Relevant Deadlines

List relevant project deadlines here or state NONE.

Funding Source

Define the source of funding and account keys here or state NONE.

Define the Metadata

Assignee

[ ] Select engineer(s) or no engineer required
[ ] Select scientist(s) or no scientist required

Labels

[ ] Review default alert labels
[ ] Select component(s)
[ ] Select priority
[ ] Select requestor(s)

Milestone and Projects

[ ] Select Milestone as a METdataio-X.Y.Z version, Consider for Next Release, or Backlog of Development Ideas
[ ] For a METdataio-X.Y.Z version, select the METdataio-X.Y.Z Development project

Define Related Issue(s)

Consider the impact to the other METplus components.

[ ] METplus, MET, METdataio, METviewer, METexpress, METcalcpy, METplotpy

Task Checklist

See the METplus Workflow for details.

[ ] Complete the issue definition above, including the Time Estimate and Funding Source.
[ ] Fork this repository or create a branch of develop. Branch name: feature_<Issue Number>_<Description>
[ ] Complete the development and test your changes.
[ ] Add/update log messages for easier debugging.
[ ] Add/update unit tests.
[ ] Add/update documentation.
[ ] Add any new Python packages to the METplus Components Python Requirements table.
[ ] Push local changes to GitHub.
[ ] Submit a pull request to merge into develop. Pull request: feature <Issue Number> <Description>
[ ] Define the pull request metadata, as permissions allow. Select: Reviewer(s) and Development issue Select: Milestone as the next official version Select: METdataio-X.Y.Z Development project for development toward the next official release
[ ] Iterate until the reviewer(s) accept and merge your changes.
[ ] Delete your fork or branch.
[ ] Close this issue.

bikegeek commented 1 month ago

The additional tests contribute towards this issue: https://github.com/dtcenter/METplus-Internal/issues/50

John-Sharples commented 3 weeks ago

Thanks for creating this ticket for me @bikegeek

I've spent some more time reading through the METdbLoad code and propose the following approach for testing.

No datatbase testing

Initially I'd like to write true "unit tests". These are tests that don't require a database and just check function behaviour. Since run_sql.py provides a convenient abstraction layer for all database reads/writes, we can create a mock of RunSql and use this to test all other modules.

Pros:

Tests can run anywhere, without needing MySQL
Avoids the overhead of setting up and managing a test database
Quickest way to increase test coverage

Cons:

Can't write meaningful tests for run_sql.py
Can't test database interactions

Testing with a datatbase

If we decide we want something more robust, we can have the tests operate on a real test database. This would be akin to "integration tests". To do this we'd need MySQL to be running in the test environment, and then write some test fixtures to manage the database state for each test.

Pros:

Tests are closer to real world functionality of METdbLoad
Tests database specific interactions (e.g. local-file configuration, database version)

Cons:

Requires MySQL to be running in test environment
Requires test infrastructure to setup and manage test database
Greater dev effort required

The above approaches are not mutually exclusive. We can start with the no database approach and transition to using a database later on. It's likely we could come up with an implementation that can do either approach depending if MySQL is available. For example, the test infrastructure could use a real database when available, otherwise fall back to the mock RunSql.

Let me know if you're happy with this approach, or have any questions?

dtcenter / METdataio