SuryaKrishna02 / maya-dataset-creation

The Repository contains the code for dataset creation for the Training the Maya: Multilingual Aya Model
MIT License
1 stars 2 forks source link

Verification Script #4

Closed SuryaKrishna02 closed 4 weeks ago

SuryaKrishna02 commented 1 month ago

Develop a Verification Script to verify the translations of the generated dataset from the c4ai-aya-23 model.

Please feel free to add more thoughts on this to better find out the faulty translations from the model.

asusevski commented 1 month ago

Hi! I will be making a PR soon, in the meantime I wanted to discuss methodology. I want to add a few different methods, one of which would be to use an LLM as a judge to verify conciseness, consistency, etc on the translation. @Asnegha and I will also include statistical methods to flag problematic translations. Does this sound alright?

SuryaKrishna02 commented 1 month ago

@asusevski @Asnegha Sounds Great. Thanks for taking up work. I guess it is better to connect and discuss these over a call so that we can scope what we can do for Aya Expedition. Or else you can mention here the brief steps/methodologies you are planning to do.

asusevski commented 1 month ago

Tasks to be completed by over the next week to close out this issue: