Closed argentini closed 2 months ago
Actually internally it's parsed into structs by this https://github.com/DQNEO/apple-dictionary-parser/blob/8dbdbbcce48e85c30ab2bca19eb2931d85e60714/parser/parser.go#L72-L97
Do you expect something like json or yaml format as an output ?
Yes, JSON would be good. I was hoping to get something like this for each
{
"word": "fly",
"homograph": "1",
...
"definitions": [
{
"pos": "verb",
"definition": "(of a bird, bat, or insect) move through the air using wings",
...
"examples": [ "The bird can fly", ...]
},
]
}
Looks good. I'll try something later.
Tried something
{
"title": "passionate",
"syllable": "pas·sion·ate",
"num_syllable": 3,
"ipa": "ˈpæʃ(ə)nət",
"meaning": " 1 2 adjective showing or caused by strong feelings or a strong belief : arising from or involving intense feelings of sexual love : passionate pleas for help | he's pa
ssionate about football . a passionate affair | a passionate kiss . ",
"phrases": "",
"phrasal_verbs": "",
"derivatives": " DERIVATIVES passionateness | ˈpaSH(ə)nətnəs ˈpæʃ(ə)nətnəs noun ",
"etymolgy": [
"late Middle English ",
" (also in the senses ",
"‘easily moved to passion’",
" and ",
"‘enraged’",
"): from ",
"medieval Latin ",
"passionatus",
"‘full of passion’",
", from ",
"passio",
" (see ",
"passion",
")"
],
"note": "",
"ff_words": [
"passionatus",
"passio"
]
},
Done.
You can try it with json
subcommand.
apple-dictionary-parser json --words=--words=happiness,joy,pleasure | jq .
Is there a way for you to include the geography and register data for each definition? For example, a word may be "British English" (geography) or "Archaic" (register). The data is in the dictionary, and each of those should be an array. Here's the structure I'm looking for.
{
"word": "arse",
"homograph": "1",
"definitions": [
{
"pos": "noun",
"langGeographies": [
"British English"
],
"langRegisters": [
"vulgar slang"
],
"definition": "a person\u0027s buttocks or anus.",
"examples": []
},
{
"pos": "noun",
"langGeographies": [
"British English"
],
"langRegisters": [
"vulgar slang"
],
"definition": "a stupid, irritating, or contemptible person.",
"examples": []
}
],
"variations": [
{
"word": "ass",
"pos": "noun"
}
],
"derivatives": [
{
"word": "arsey",
"pos": "adjective"
}
]
}
I don't plan to do it. This project is a tool and library to parse the dictionary. You can build your own software by using my tool.
Is there any way to better organize the data so that it can be parsed? For example, trying to identify patterns for definitions and their parts of speech doesn't seem practical with this pseudo-XML/HTML format.