Data Science for Software Engineering (ds4se) is an academic initiative to perform exploratory and causal inference analysis on software engineering artifacts and metadata. Data Management, Analysis, and Benchmarking for DL and Traceability.
Phase II is aiming at filling the gaps to have a fully functional T-Miner (beta) version. To have a stable version, we need to adopt new SE methodologies that work specifically for data science and machine learning. Such methodologies involve other frameworks such as DVC, nbdev, and TFX. This phase is composed of the following activities:
T-Miner
[ ] T-Miner Interoperability and Deployment. We must guarantee that T-miner is communicating with the DS4SE library, Jenkins, and a SecureReqNet deployed version.
[ ] T-Miner Navigation. We must guarantee that the proposed navigation is functional and stable. Important use cases: information recovery (traceability) and information analysis (entropy). The tool should retrieve, create, update, and delete traceability results.
[ ] Causal Inference View. We require to implement a causal inference view for T-miner. CI should be consumed from DS4SE. However, no modules in DS4SE have been fully developed. This is a whole bach-end solution to update our previous COMET solution.
DS4SE
[ ] Data repository integration. We have been employing DVC for data versioning. However, our projects are not fully integrated. We require to centralize in a single remote all the SE-Related data. Our current architecture allows one remote per git-project, which generates data redundancies.
[ ] Data Science/ML Continues Integration. We want to adopt Continuous Machine Learning or CML. The main goal of CML is to keep all our experiments and models under control. Similar to TFX, DVC has its own pipeline solution here.
[ ] Migrating Unsupervised Traceability Models into CML-DVC. All our unsupervised models will be shaped as an ML pipeline for further enhancement and development.
Phase II is aiming at filling the gaps to have a fully functional T-Miner (beta) version. To have a stable version, we need to adopt new SE methodologies that work specifically for data science and machine learning. Such methodologies involve other frameworks such as DVC, nbdev, and TFX. This phase is composed of the following activities:
T-Miner
DS4SE