RFW0137: Training data and its collection process documentation
Summary
We need a documentation of our OCR training data and its collection process.
Key Concepts
Training data: Machine learning training data
Context
As of now our OCR training data are scattered in multiple storage infrastructures like S3 and github. We don't have single documentation of where about the data. We need a documentation so that any other ML engineer in future can use our data and train model on different state of art. We want to share the different approaches we have collected data for future references.
Outputs
The documentation of the training data needs to cover following things:
What are the different type of labelled data we have
How are they prepared
How are they stored
How someone can access the training data
Inputs
Training data
Timeline
Specify the expected delivery date for the project.
RFW0137: Training data and its collection process documentation
Summary
We need a documentation of our OCR training data and its collection process.
Key Concepts
Training data: Machine learning training data
Context
As of now our OCR training data are scattered in multiple storage infrastructures like S3 and github. We don't have single documentation of where about the data. We need a documentation so that any other ML engineer in future can use our data and train model on different state of art. We want to share the different approaches we have collected data for future references.
Outputs
The documentation of the training data needs to cover following things:
Inputs
Training data
Timeline
Specify the expected delivery date for the project.
References
Quarto is a good platform to document.