CarperAI / Code-Pile

This repository contains all the code for collecting large scale amounts of code from GitHub.
MIT License
105 stars 29 forks source link

Standardize Testing, Datatests, Unittests, Integration-Tests #39

Closed flowpoint closed 3 weeks ago

flowpoint commented 2 years ago

Discuss, standardize and track how we want to test the submodules of the Code-Pile here.

flowpoint commented 2 years ago

Atm. we tend towards using pytest through github actions. We want some sample input data and the target output data, for each Processor. One current way, would be to use parquet files with some "dummy data" to test against our intermediate data.

Having both real testdata and some edge case synthetic testdata would be best. For example, real testdata to ensure question and answer pairs are still matched. For example, use synthetic testdata to check if weird unicode characters are properly kept.

I don't think we need to integration test everything, but if a dependency isn't straightforward, we might want to add some sanity integration-tests against it too.

flowpoint commented 3 weeks ago

closing since the project is afaict over