Imperial-EE-Microsoft / microsoft_translation_public

Public Notebooks for testing Co-Operator's abilities
1 stars 2 forks source link

Modularization for testing #2

Closed skytin1004 closed 1 month ago

skytin1004 commented 1 month ago

Hi @timothycdc,

I learned about this project through Lee and would like to contribute by enhancing the accuracy of the translation project through extensive testing and validation. To facilitate the creation of diverse test cases, I have modularized the project structure. You can refer to my draft pull request (PR #1).

I would like to discuss with the team if this approach aligns with our direction. If it is not yet determined, I suggest proceeding with this modular structure as a baseline for writing and validating test cases.

Please share your thoughts and feedback. If the team agrees with this approach, I will finalize the modularization and proceed with writing the test cases.

timothycdc commented 1 month ago

Hi @skytin1004, thank you for your interest in contributing.

Some comments:

I want to highlight more information and other areas of priority to see if you are interested in contributing there.

Our team actually built a Django/GitHub app for our university demo (it is private but I am uploading a public version for you). The problem we have is that many users just want to demo the capabilities of LLMs without having to host/install their own GitHub app.

Currently in this repo, we only have one working notebook which translates local images.

So my goal for this repo is to have some python scripts that can translate markdown files: The idea is that they look for .md and image files in a folder and run the necessary image/text translations, and produce new translated images/md files in an output directory. And then we can have another notebook for the same feature so devs can play around with it.

My team already wrote most of the important logic in the app but since it is badly structured, we don't want to focus on the app anymore, just a public repo for others to try out with their own local examples.

The translation process I am planning is like this: (similar to the app)

  1. Find out the translation language from the user. We use 2-letter iso codes, and reference against a yml file like this to get the correct font for image translation
  2. Translate all image files following the same method in the repo notebook. Store them in the output directory
  3. For each md file, separate text into chunks to make sure we are in token limits for each request sent to OpenAI Azure.
  4. For each chunk, add a translation prompt on top of it. Send it to the LLM to translate. We used asyncio for the app but I think we should change to nest-asyncio for nested loops, which cause less errors.
  5. Then combine the translated chunks back together.
  6. Then run regex on markdown links and replace them with links to the translated images.
    • In the app, we had a hashing function with markdown images to prevent name collisions in the GitHub repo. This is unnecessary for the demo because we will be storing all images in the same directory.
skytin1004 commented 1 month ago

Hi @timothycdc,

Thank you for your feedback. I appreciate your suggestions.

I agree that due to the probabilistic nature of LLMs, difflib may not be sufficient for accurate evaluation.

I will look into better methods to evaluate diverse translation outputs accurately. The idea of using LLMs as a judge for benchmarking translations also seems good.

I understand that you will accept the PR after the presentation on August 5th. I wish you all the best in your preparations.

In the meantime, I will check the repositories you shared.

I think it would be beneficial to create a notebook that imports the modularized code and allows users to perform the translation process step by step.

Once my PR is works correctly, I will change its status from Draft PR to Open PR and add a comment to let you know.

Thank you again for your guidance.

skytin1004 commented 1 month ago

@timothycdc , Could you please check this file twitter.py? It looks like there might be an exposed key.

timothycdc commented 1 month ago

Hi, I’m not at my computer now. Thanks for the spot — I’ve asked my team member to disable it for now