SSoelvsten / en-dk

English-Danish Technical Dictionary of Computer Science
https://ssoelvsten.github.io/en-dk
MIT License
5 stars 4 forks source link

Export to CSV/TSV #21

Open mlutze opened 2 months ago

mlutze commented 2 months ago

I use this repo to generate flashcards for Danish study. My flashcard program (Anki) consumes CSV/TSV files. In the past, I hacked together a script to parse the dictionary file, but that required some manual wrangling and is not robust to change in the dictionary format.

I would like a button on the page that allows me to download the dictionary as a CSV/TSV file.

SSoelvsten commented 2 months ago

Good idea! That shouldn't be too difficult.

How exactly should the CSV or TSV file look? For example, if we look at the following entry:

  {
    word: "chain",
    type: "sb.",
    translations: [
      "kæde, -n, -r, -rne",
      "(hash table) kollisionsliste, -n, -r, -rne"
    ],
    keywords: ["computer science"],
    phrases: [
      ["The table uses hashing with chaining", "Tabellen bruger kollisionslister"]
    ]
  },
mlutze commented 2 months ago

Since the nesting isn't very deep, maybe it could be one column per top-level field, and then divide those as appropriate. For this example, I might do:

word    type    translations    keywords    phrases
chain   sb. kæde, -n, -r, -rne; (hash table) kollisionsliste, -n, -r, -rne  computer science, some other keyword    The table uses hashing with chaining; Tabellen bruger kollisionslister

I separated field by tab, translations by semicolon, keywords by comma, and phrases by semicolon.