formermagic / formerbox

MIT License
1 stars 0 forks source link

Modular cli framework for data processing pipelines and training #12

Closed mozharovsky closed 4 years ago

mozharovsky commented 4 years ago

Summary

This PR contains a lot of changes. Normally, we'd split the release development into smaller PRs which is a more friendly way to review new features, changes and enhancements. But that's it, we'll improve on that in the next one. ;]

We focused on making the data processing and training pipelines modular support a broader variety of tasks. We also provided a cli supporting user packages and plugins for simpler interaction with built-in and custom tasks.

Patch Notes

Common

Data Preprocessing Pipeline

Tokenization Pipeline

Training Pipeline

CLI Support

Documentation

Testing

Additional Info

It's also worth noting that we moved to Notion to track our internal vision and milestones for this project. We're likely to still support GitHub Issues and Projects to mirror the top-level stuff to the community. :]