Unstructured-IO / unstructured-api-tools

Apache License 2.0
28 stars 10 forks source link

Comprehensive unittests for generated APIs #104

Open cragwolfe opened 1 year ago

cragwolfe commented 1 year ago

Objective

Add unittests to test the generated APIs for the many permutations of potential pipeline_api definitions. There are a lot of tests to write and permutations to test. As such, multiple incremental PR's are strongly preferred over a mega PR.

Create a pipeline-test project as a fixture for testing

The root of the pipeline-test project is a barebones preprocessing pipeline family project. It should include a pipeline-notebooks directory with test pipeline notebooks, an empty prepline_test directory, and a preprocessing-pipeline-family.yaml file. pipeline test may exist under test_unstructured_api_tools/fixtures (or another reasonable place).

Each pipeline-notebook includes the definition for a pipeline_api, so there is a pipeline notebook for each of the following:

pipeline_api permutations to test:

def pipeline_api(text) def pipeline_api(text, m_input1=[], m_input2=[]) def pipeline_api(text, response_type="text/csv") def pipeline_api(text, response_type="application/json", response_schema="isd") def pipeline_api(file) def pipeline_api(file, response_type="text/csv", response_schema="isd") def pipeline_api(file, file_content_type, response_type="application/json", response_schema="labelstudio", m_input1=[]) def pipeline_api(file, file_content_type, filename, response_type="application/json", response_schema="isd", m_input2=[], m_input1=[]) def pipeline_api(text, file, file_content_type, filename) def pipeline_api(text, file, file_content_type, filename, response_type="application/json", m_input2=[]) def pipeline_api(text, file, file_content_type, filename, response_type="application/json", response_schema="isd") def pipeline_api(text, file, file_content_type, filename, response_type="application/json", response_schema="isd", m_input1=[], m_input2=[])

The API's should execute some trivial code to validate they are handling an uploaded file or text file appropriately. E.g., make sure the length of the content is as expected (and/or, the first few characters make sense).

Test Cases Against the Generated FastAPI Routes

For each generated API, run a FastAPI TestClient and submit an HTTP Post to cover a number of permutations, including:

For all of the above, test with:

For "single text file posted" or "single non-text file posted" cases, test:

For "multiple text files posted" or "multiple non-text files posted" cases, test:

Linting checks

Finally, a test should run flake8 and mypy against the the api/ modules to ensure that the library is generating clean code.

Definition of Done (Initial PR)

cragwolfe commented 1 year ago

Assigned to kravetsmic (will update Assignee once invite is accepted)