Output structure isn't really able to be parsed

DQNEO / apple-dictionary-parser

Tool for parsing data from MacOS's dictionary files

MIT License

16 stars 1 forks source link

Output structure isn't really able to be parsed #4

Closed argentini closed 2 months ago

argentini commented 2 months ago

Is there any way to better organize the data so that it can be parsed? For example, trying to identify patterns for definitions and their parts of speech doesn't seem practical with this pseudo-XML/HTML format.

DQNEO commented 2 months ago

Actually internally it's parsed into structs by this https://github.com/DQNEO/apple-dictionary-parser/blob/8dbdbbcce48e85c30ab2bca19eb2931d85e60714/parser/parser.go#L72-L97

Do you expect something like json or yaml format as an output ?

argentini commented 2 months ago

Yes, JSON would be good. I was hoping to get something like this for each :

{
    "word": "fly",
    "homograph": "1",
    ...
    "definitions": [
        {
            "pos": "verb",
            "definition": "(of a bird, bat, or insect) move through the air using wings",
            ...
            "examples": [ "The bird can fly", ...]
        },
    ]
}

DQNEO commented 2 months ago

Looks good. I'll try something later.

DQNEO commented 2 months ago

Tried something

  {
    "title": "passionate",
    "syllable": "pas·sion·ate",
    "num_syllable": 3,
    "ipa": "ˈpæʃ(ə)nət",
    "meaning": "          1       2   adjective   showing or caused by strong feelings or a strong belief :    arising from or involving intense feelings of sexual love :      passionate pleas for help  |   he's pa
ssionate about football .    a passionate affair  |   a passionate kiss .  ",
    "phrases": "",
    "phrasal_verbs": "",
    "derivatives": " DERIVATIVES     passionateness   |   ˈpaSH(ə)nətnəs ˈpæʃ(ə)nətnəs    noun  ",
    "etymolgy": [
      "late Middle English ",
      " (also in the senses ",
      "‘easily moved to passion’",
      " and ",
      "‘enraged’",
      "): from ",
      "medieval Latin ",
      "passionatus",
      "‘full of passion’",
      ", from ",
      "passio",
      " (see ",
      "passion",
      ")"
    ],
    "note": "",
    "ff_words": [
      "passionatus",
      "passio"
    ]
  },

DQNEO commented 2 months ago

Done.

You can try it with json subcommand.

apple-dictionary-parser json --words=--words=happiness,joy,pleasure | jq .

argentini commented 1 month ago

Is there a way for you to include the geography and register data for each definition? For example, a word may be "British English" (geography) or "Archaic" (register). The data is in the dictionary, and each of those should be an array. Here's the structure I'm looking for.

{
    "word": "arse",
    "homograph": "1",
    "definitions": [
        {
            "pos": "noun",
            "langGeographies": [
                "British English"
            ],
            "langRegisters": [
                "vulgar slang"
            ],
            "definition": "a person\u0027s buttocks or anus.",
            "examples": []
        },
        {
            "pos": "noun",
            "langGeographies": [
                "British English"
            ],
            "langRegisters": [
                "vulgar slang"
            ],
            "definition": "a stupid, irritating, or contemptible person.",
            "examples": []
        }
    ],
    "variations": [
        {
            "word": "ass",
            "pos": "noun"
        }
    ],
    "derivatives": [
        {
            "word": "arsey",
            "pos": "adjective"
        }
    ]
}

DQNEO commented 3 weeks ago

I don't plan to do it. This project is a tool and library to parse the dictionary. You can build your own software by using my tool.