OpenPecha / Requests

RFWs and RFCs for all OpenPecha repositories
0 stars 0 forks source link

[RFC0126] Build a catalog for OpenPecha-Data with latest changes #378

Open tenzin3 opened 5 months ago

tenzin3 commented 5 months ago

RFC00126: Build a catalog for OpenPecha-Data with latest changes

Named Concepts

Catalog: a list of details of each training data Toolkit: already made package made by monlam organization that is use for working with OpenPecha-data repositories.

Summary

We need to have a catalog of OpenPecha-Data containing the latest changes. Some of the OPF repository in OpenPecha-Data has different format, so we need to categorize them by logging them.

OpenPecha-Data already has a catalog which has details about roughly(97%) of the OPF repository. We need all OPF and OPA name using github api.

We would try to create a pecha object using OpenPecha toolkit and categorize which OPF works and which OPF has different file structure format, or definition.

Dependencies

OpenPecha toolkit

Infrastructures

vast.ai to store OpenPecha Repositories.

Design Illustrations

image

Justification

We are using OpenPecha toolkit to get the features such as file structure and meta data to categorize them, because toolkit already covers majority of pecha.

Testing

Collecct few opf with different structure and check if the script could properly categorize and log it.

Implementation Steps

List all the steps involved during implementation.

Reviewed By

@ta4tsering @kaldan007

kaldan007 commented 5 months ago

@tenzin3 I doubt u need to write a pecha downloader pipeline, cuz we already have download option in toolkit.

kaldan007 commented 5 months ago

other than i m ok

tenzin3 commented 5 months ago

@tenzin3 I doubt u need to write a pecha downloader pipeline, cuz we already have download option in toolkit.

@kaldan007 okay i already put that out.