OpenEnergyPlatform / open-MaStR

A collaborative software to download the energy database Marktstammdatenregister (MaStR)
https://open-mastr.readthedocs.io/en/latest/
GNU Affero General Public License v3.0
84 stars 17 forks source link

Create a `Mastr.translate` method #461

Closed FlorianK13 closed 10 months ago

FlorianK13 commented 1 year ago

Description of the issue

We could now implement a translation method for the Mastr database, where all columns are translated to english. Thanks to LLMs like chatgpt we would not need to translate it on our own.

Ideas of solution

  1. Create a list of all distinct column names of all tables.
  2. Pass this list to chatGPT asking for a translation of every item.
  3. Create a dictionary with translations. If new columns are added and not available in the dict, they shall not be translated.

Workflow checklist

FlorianK13 commented 1 year ago

I think this could be solved as follows:

  1. Get a list of all column names from all tables, either by connecting to an existing database or by using the orm.py file and according sqlalchemy methods.
  2. Transfer this list to a set.
  3. Go to your favourite LLM and create a translation dictionary from this set of column names.
  4. Implement a Mastr.translate method that takes the downloaded database, iterates over all tables and all columns and translates them. The database should then be renamed to open_mastr_translated.db so that the open_mastr module will not try to work with it again when writing new data to this database.