anki-decks / anki-deck-for-duolingo-chinese

Anki deck for the words of the Duolingo course "Chinese for English speakers"
MIT License
52 stars 27 forks source link

Write primary definition and pinyin for each line #2

Open nicolas-raoul opened 5 years ago

nicolas-raoul commented 5 years ago

@leonfox1 please perform a "Git pull" to get my latest changes, before you start modifying the TSV. See https://help.github.com/articles/fetching-a-remote/#pull or just press "Sync" if you use a desktop Git program. Thanks!

leonfox1 commented 5 years ago

Today, I found a tool that pulls all the "Google translate" definitions for each vocab word, and in particular, it appears to keep it very short and simple. Although I don't see a way of getting around the significant manual work required in cleaning up this vocab, I think this tool ultimately does a better job at selecting the "Primary definition" and keeping it short and simple. I will share you a sample of this in appx. 12 hours.

nicolas-raoul commented 5 years ago

Unfortunately, I don't think the terms and conditions of Google Translate allow us to share the results. So we have to use Open Data resources such as the CC-CEDICT dictionary.

leonfox1 commented 5 years ago

Ok, that is a good point. I will limit my usage to using it as a reference rather than copying the definition.

leonfox1 commented 5 years ago

Sorry I have been MIA for a week--been a busy week for me with other tasks. I am about 2/3 through updating the manual definitions of Duolingo section 3. I plan to merge my changes tomorrow and I also plan to play around with Forvo a bit tomorrow, thought uncertain whether I will be able to get that part working tomorrow.

nicolas-raoul commented 5 years ago

No problem! After you add section 3 I will start using the Anki deck in real life to remember the stuff :-) (in the future I will probably have to delete the deck and restart my progress from scratch several times in the future but I don't care, that's the life of developers ^^)

leonfox1 commented 5 years ago

As I have been going through the definitions manually, I have come to the realization that I simply don't have the expertise to select the best definition all of the time. Therefore, in the interests of creating a deck in a timely manner that is useful, even if not 100% perfect, I have generally been trying to select a definition(s) that fit the following criteria:

To do this, I usually start with the CC-CEDICT definition and simplify it down, briefly taking into consideration how Duolingo is using the word/character AND what Google Translate identified as the most common definition.

The Google Translate definition tends to be too simple and CC-CEDICT definition is too thorough and complete. I have been striving to reduce the CC-CEDICT definition down to 1-2 lines on a flashcard AND list the most likely primary definition first. I estimate that in about 90-95% of the cases, the simple definition that I am selecting will be sufficient. For the times that a user needs a more detailed definition, we can still import the CC-CEDICT definition as an optional field in Anki.

I am typically leaving several possible definitions separated by slashes rather than try to pick out the best one or two, but I am often substantially reducing the list of possible definitions compared to CC-CEDICT.

As a result of this approach:

leonfox1 commented 5 years ago

Also, when I publish the APKG file, I am going to publish the complete vocab list even though I am only 1/3 of the way through the manual reviews. As a placeholder on the unfinished 2/3, I put in the Google Translate definition with a prefix of "GT: ". It is just us two using the deck, and I wanted a "clean" placeholder until I can finish the manual reviews. Also, didn't have enough time to work on the FORVO audio today. So the deck will either have no audio, or the incomplete and not-as-good IMTranslator audio.

nicolas-raoul commented 5 years ago

Great, thanks a lot! :-)

I haven't done anything so I am not in a position to give advice, but here is my point of view about how far to go when simplifying definitions. Personally I would probably be much more aggressive in leaving only a single definition for each word. For instance I would definitely simplify "subway / metro" into just "subway", this is probably not very controversial, but I would go much further and even simplify "to sleep / to lie down" to just "to sleep" for instance.

Having spent a significant percentage of my lifetime learning with flashcards, I find single meanings to be so much faster to remember than double meanings, allowing me to get in minimal time a first general idea of what the term means, thus allowing me to recognize it in texts or conversations from an early stage. Of course after getting used to the term and seeing it in various contexts I will realize that there is more subtlety to the term, but by that time I won't need that card anymore.

Also, I totally agree with you that we should only match what the word means within Duolingo. We are only a study aid for Duolingo, people who want to go further than that should switch to other resources.

leonfox1 commented 5 years ago

Ok, I don't mind deferring to you on this. Actually, one of the things that makes it hard for me to simplify these definitions down to just one or even two words is that I know just enough Chinese to know that many words have a lot more subtlety to them. "走" is a good example of this, where I regularly hear it used for at least 3 different things. Or "下" is another example, where even within Duolingo it can easily mean next, below, down, etc.

Fortunately, although I am leaving multiple definitions for most words, I am making an effort to list my guess as to the most common OR most Duolingo-friendly match to the word first. This will allow me to just strip off the first definition from my list to create a single word definition, but still allow me to maintain the work that I have already done.

So, going forward, I am going to add another column to the "Words" file which includes only a "Single Simple Definition" stripped from my "Selected English Definition" column. I will also put even more attention into selecting my best guess of the definition first, so that the Single Simple Definition is as high quality as I can make it.

Although continuing to add columns may seem unnecessary, I am increasingly looking at using card templates and filters to allow users to customize the deck in many different ways that they find useful.

There are dozens of examples of this, but a few might include a version with a Japanese hint (for you), a version with traditional characters (which I personally have no interest in, but understand others do), or a filter applied to only study individual characters and not cards with multiple characters (this is one that I personally want, that many others will probably not).

leonfox1 commented 5 years ago

Hi Nicolas, I owe you an apology for falling off this project... I do intend to take the project to, at least some degree, of completion. I got distracted in mid-December by preparing for an upcoming trip, then the trip, then my wife's unexpected hospital visit, now Chinese New Year... the list of excuses can go on and on :) Anyways, I wanted to apologize for disappearing without a trace. If you are too busy to collaborate, no worries, but I wanted to let you know that I do still intend to continue work on it in the near future.

nicolas-raoul commented 5 years ago

@leonfox1 No worries, that's open source: People are free to join and leave and come again when they want :-)