RFC00126: Build a catalog for OpenPecha-Data with latest changes
Named Concepts
Catalog: a list of details of each training data
Toolkit: already made package made by monlam organization that is use for working with OpenPecha-data repositories.
Summary
We need to have a catalog of OpenPecha-Data containing the latest changes. Some of the OPF repository in OpenPecha-Data has different format, so we need to categorize them by logging them.
OpenPecha-Data already has a catalog which has details about roughly(97%) of the OPF repository. We need all OPF and OPA name using github api.
We would try to create a pecha object using OpenPecha toolkit and categorize which OPF works and which OPF has different file structure format, or definition.
Dependencies
OpenPecha toolkit
Infrastructures
vast.ai to store OpenPecha Repositories.
Design Illustrations
Justification
We are using OpenPecha toolkit to get the features such as file structure and meta data to categorize them, because toolkit already covers majority of pecha.
Testing
Collecct few opf with different structure and check if the script could properly categorize and log it.
Implementation Steps
List all the steps involved during implementation.
[x] OpenPecha/openpecha_data_cataloger#1
Estimated time: 3 hrs
Actual time:
[x] OpenPecha/openpecha_data_cataloger#2
Estimated time: 1 day
Actual time:
[x] OpenPecha/openpecha_data_cataloger#3
Estimated time: 4 hrs
Actual time:
[x] OpenPecha/openpecha_data_cataloger#4
Estimated time: 1 day
Actual time:
[x] OpenPecha/openpecha_data_cataloger#5
Estimated time: 1 day
Actual time:
RFC00126: Build a catalog for OpenPecha-Data with latest changes
Named Concepts
Catalog: a list of details of each training data Toolkit: already made package made by monlam organization that is use for working with OpenPecha-data repositories.
Summary
We need to have a catalog of OpenPecha-Data containing the latest changes. Some of the OPF repository in OpenPecha-Data has different format, so we need to categorize them by logging them.
OpenPecha-Data already has a catalog which has details about roughly(97%) of the OPF repository. We need all OPF and OPA name using github api.
We would try to create a pecha object using OpenPecha toolkit and categorize which OPF works and which OPF has different file structure format, or definition.
Dependencies
OpenPecha toolkit
Infrastructures
vast.ai to store OpenPecha Repositories.
Design Illustrations
Justification
We are using OpenPecha toolkit to get the features such as file structure and meta data to categorize them, because toolkit already covers majority of pecha.
Testing
Collecct few opf with different structure and check if the script could properly categorize and log it.
Implementation Steps
List all the steps involved during implementation.
[x] OpenPecha/openpecha_data_cataloger#1 Estimated time: 3 hrs Actual time:
[x] OpenPecha/openpecha_data_cataloger#2 Estimated time: 1 day Actual time:
[x] OpenPecha/openpecha_data_cataloger#3 Estimated time: 4 hrs Actual time:
[x] OpenPecha/openpecha_data_cataloger#4 Estimated time: 1 day Actual time:
[x] OpenPecha/openpecha_data_cataloger#5 Estimated time: 1 day Actual time:
Reviewed By
@ta4tsering @kaldan007