Add unittests to test the generated APIs for the many permutations of potential pipeline_api definitions. There are a lot of tests to write and permutations to test. As such, multiple incremental PR's are strongly preferred over a mega PR.
Create a pipeline-test project as a fixture for testing
The root of the pipeline-test project is a barebones preprocessing pipeline family project. It should include a pipeline-notebooks directory with test pipeline notebooks, an empty prepline_test directory, and a preprocessing-pipeline-family.yaml file. pipeline test may exist under test_unstructured_api_tools/fixtures (or another reasonable place).
Each pipeline-notebook includes the definition for a pipeline_api, so there is a pipeline notebook for each of the following:
The API's should execute some trivial code to validate they are handling an uploaded file or text file appropriately. E.g., make sure the length of the content is as expected (and/or, the first few characters make sense).
Test Cases Against the Generated FastAPI Routes
For each generated API, run a FastAPI TestClient and submit an HTTP Post to cover a number of permutations, including:
single text file posted (e.g. curl short hand of -F text_files=@fname1.txt)
multiple text files posted (e.g. curl short hand of -F text_files=@fname1.txt-F text_files=@fname2.txt )
Repeat the above two bullets, except for non-text files. In this case the form parameter is files rather than text_files
For all of the above, test with:
The curl equivalent of -F input1=mytestvalue
The curl equivalent of -F input2=val2
The curl equivalent of -F input1=anothervalue-F input2=atestvalue2
For "single text file posted" or "single non-text file posted" cases, test:
The curl equivalent of -H 'Accept: application/json'
And optionally -F output_schema=isd or -F output_schema=labelstudio
The curl equivalent of -H 'Accept: text/csv'
And optionally -F output_schema=isd
The curl equivalent of -H 'Accept: application/notsupported
For "multiple text files posted" or "multiple non-text files posted" cases, test:
The curl equivalent of -H 'Accept: application/json'
where -F output_format=application/json , -F output_format=text/csv (an error), or output_format is not included
for valid cases in the above bullet, also test where the form parameter output_schema is not present, or with a value of: -F output_schema=labelstudio', -F output_schema=isd, or -F output_schema=non-sensical
Same as the above except with -H 'Accept: multipart/mixed'
Only unlike the above -F output_format=text/csv is valid (text/csv per part)
Same as the above except with -H 'Accept: application/notsupported
Linting checks
Finally, a test should run flake8 and mypy against the the api/ modules to ensure that the library is generating clean code.
Definition of Done (Initial PR)
As mentioned, multiple PR's are preferred. The initial PR should at least cover a few pipeline(text,... cases.
Objective
Add unittests to test the generated APIs for the many permutations of potential
pipeline_api
definitions. There are a lot of tests to write and permutations to test. As such, multiple incremental PR's are strongly preferred over a mega PR.Create a pipeline-test project as a fixture for testing
The root of the
pipeline-test
project is a barebones preprocessing pipeline family project. It should include apipeline-notebooks
directory with test pipeline notebooks, an emptyprepline_test
directory, and apreprocessing-pipeline-family.yaml
file.pipeline
test may exist undertest_unstructured_api_tools/fixtures
(or another reasonable place).Each pipeline-notebook includes the definition for a
pipeline_api
, so there is a pipeline notebook for each of the following:pipeline_api permutations to test:
def pipeline_api(text)
def pipeline_api(text, m_input1=[], m_input2=[])
def pipeline_api(text, response_type="text/csv")
def pipeline_api(text, response_type="application/json", response_schema="isd")
def pipeline_api(file)
def pipeline_api(file, response_type="text/csv", response_schema="isd")
def pipeline_api(file, file_content_type, response_type="application/json", response_schema="labelstudio", m_input1=[])
def pipeline_api(file, file_content_type, filename, response_type="application/json", response_schema="isd", m_input2=[], m_input1=[])
def pipeline_api(text, file, file_content_type, filename)
def pipeline_api(text, file, file_content_type, filename, response_type="application/json", m_input2=[])
def pipeline_api(text, file, file_content_type, filename, response_type="application/json", response_schema="isd")
def pipeline_api(text, file, file_content_type, filename, response_type="application/json", response_schema="isd", m_input1=[], m_input2=[])
The API's should execute some trivial code to validate they are handling an uploaded file or text file appropriately. E.g., make sure the length of the content is as expected (and/or, the first few characters make sense).
Test Cases Against the Generated FastAPI Routes
For each generated API, run a FastAPI
TestClient
and submit an HTTP Post to cover a number of permutations, including:-F text_files=@fname1.txt
)-F text_files=@fname1.txt
-F text_files=@fname2.txt
)files
rather thantext_files
For all of the above, test with:
-F input1=mytestvalue
-F input2=val2
-F input1=anothervalue
-F input2=atestvalue2
For "single text file posted" or "single non-text file posted" cases, test:
-H 'Accept: application/json'
-F output_schema=isd
or-F output_schema=labelstudio
-H 'Accept: text/csv'
-F output_schema=isd
-H 'Accept: application/notsupported
For "multiple text files posted" or "multiple non-text files posted" cases, test:
-H 'Accept: application/json'
-F output_format=application/json
,-F output_format=text/csv
(an error), oroutput_format
is not includedoutput_schema
is not present, or with a value of:-F output_schema=labelstudio'
,-F output_schema=isd
, or-F output_schema=non-sensical
-H 'Accept: multipart/mixed'
-F output_format=text/csv
is valid (text/csv per part)-H 'Accept: application/notsupported
Linting checks
Finally, a test should run flake8 and mypy against the the
api/
modules to ensure that the library is generating clean code.Definition of Done (Initial PR)
pipeline(text,...
cases.