BoboTiG / ebook-reader-dict

Finally decent dictionaries based on Wiktionary for your beloved eBook reader.
http://www.tiger-222.fr/?d=2020/04/17/22/14/21-un-dictionnaire-alternatif-et-complet-pour-votre-liseuse
MIT License
391 stars 21 forks source link

Improve the wiki for locale addition #824

Open BoboTiG opened 3 years ago

BoboTiG commented 3 years ago

Clearly the HOWTO Add a New Local is not well done. It is missing a lot of informations and steps to help new contributors adding a new locale.

It should be revised.

Upvote & Fund

Fund with Polar

BoboTiG commented 3 years ago

I just enhanced the python -m wikiscript --help output:

eBook Reader Dictionaries

Usage:
    wikidict LOCALE
    wikidict LOCALE -h, --help
    wikidict LOCALE --download
    wikidict LOCALE --parse
    wikidict LOCALE --render
    wikidict LOCALE --convert
    wikidict LOCALE --find-templates
    wikidict LOCALE --check-random-words N
    wikidict LOCALE --check-word=WORD
    wikidict LOCALE --get-word=WORD [--raw]
    wikidict LOCALE --gen-dict=WORDS --output=FILENAME
    wikidict LOCALE --update-release

Options:
  --download                Retrieve the latest Wiktionary dump into "data/$LOCALE/pages-$DATE.xml".
  --parse                   Parse and store raw Wiktionary data into "data/$LOCALE/data_wikicode-$DATE.json".
  --render                  Render templates from raw data into "data/$LOCALE/data-$DATE.json"
  --convert                 Convert rendered data to working dictionaries into several files:
                                - "data/$LOCALE/dicthtml-$LOCALE.zip": Kobo format.
                                - "data/$LOCALE/dict-$LOCALE.df": DictFile format.
  --find-templates          DEBUG: Find all templates in use.
  --check-random-words N    Get and render N words.
                            Then compare with the rendering done on the Wiktionary to catch errors.
  --check-word=WORD         Get and render WORD.
                            Then compare with the rendering done on the Wiktionary to catch errors.
  --get-word=WORD [--raw]   Get and render WORD. Pass --raw to ouput the raw HTML code.
  --gen-dict=WORDS          DEBUG: Generate the Kobo dictionary for specific words. Pass multiple words
                            separated with a comma: WORD1,WORD2,WORD3,...
                            The generated filename can be tweaked via the --output=FILENAME argument.
  --update-release          DEV: Update the release description. Do not use it manually but via the CI only.

If no argument given, --download, --parse, --render and --convert will be done automatically.
BoboTiG commented 3 years ago

I regenerated html/wikidict/user_functions.html. It will hopefully be useful to someone :)

BoboTiG commented 3 years ago

cc @atti84it and @Duckbilled ⏫

BoboTiG commented 3 years ago

Here is the email I sent to @atti84it about next steps (after the locale has been added):

First, keep your fork up-to-date. I enhanced the --help output.

Then, you need to call --find-templates, see what files are generated (the script will tell you). Open those files (first, ensure to have a good coverage of sections, so have a look at sections.txt before the rest) and see if one or several sections are interesting. If yes, update sections from it/__init__.py (replace it by no or the locale you are working on). Rerun --find-templates again and see if you are missing sections. Iterate like that until you think your are covering all sections of the language. When sections are done, do the same, but with having a look at templates.txt. That is a huge part. There will be a lot ot flase positives also. When there is template, you have to handle it (pick up templates used a lot first I would say). When the template is handled, rerun --find-templates again, and again, and again :)

That is the core of the thing: rendering templates as they are rendered on the Wiktionary. Have a look at --help, there are a bunch of arguments to help you (with description).

I already created an issue to handle the "Term" template: https://github.com/BoboTiG/ebook-reader-dict/issues/842. This is how we ask for a template support or a modification of an existing one (just check other opened and closes issues for inspiration. Also try to follow the commit description, very simple:

fix #NNN: TITLE

Example:

fix #842: [IT] Support the 'Term' template

All simply :)

For now, I would say:

  • focus on finding the right sections, then open a PR with the changes.
  • then, open tickets for a template support and work on them when you want (assign issues to yourself to let other know they should not work on it)

And to be less though, you can still open PRs without opening issues first. But try to keep one commit per template with always the same syntax:

[IT] Add support for the 'xxx' template
[IT] Better handling of the 'xxx' template
[IT] Handle 'xxx' and 'ddd' templates

You get it, start with "[LOCALE]" and just say what template is being impacted by those changes.


About infer the transformation.

There are several ways.

1) Simple italic words: update templates_italic. Check the french one for inspiration, or any other locale. 2) Templates with multiple arguments: when they are simple enough to be handled with one line of Python, just update templates_multi. Again, check the french one for inspiration. For example, https://github.com/BoboTiG/ebook-reader-dict/issues/842 should be handled here. 3) Complex templates: they have several arguments with different possible outputs. For now, open an issue and we will think about it later. I would like you first to "master" cases 1 and 2 which are the most common templates across the Wiktionary.

I hope it is clear enough to move forward :) If you need help, to not be shy and open a PR: either Nicolas or me will be happy to help and it will be good for other (new) contrinutors to see those answers.

BoboTiG commented 3 years ago

Here is the email I just sent to @Duckbilled about how to work with git and GitHub:

Here is a typicall workflow, do those actions before working on any PR:

1) You ensure your fork is synchronized:

   $ git checkout master
   $ git pull origin master

2) Create a specifi branch for your patch:

   $ git checkout -b fix-NNN
   # Where NNN is the issue number, if it exists.
   # Or use impr-no-xxx-tpl or fix-no-xxx-tpl where xxx is the template name.

3) Code.

4) Check your changes:

   # Replace "no" with "it" or the locale you are working on:
   $ python -m pytest --doctest-modules wikidict/lang/no tests/test_no.py --no-cov
   $ ./check.sh

5) Incorpore our changes:

   $ git add -p
   # Validate modifications you want by typing "y" or "n" to discard.

   # If you added new files, add them specifically:
   $ git add tests/data/no/word.wiki

   # Check no files left without your consent :) :
   $ git status

6) Commit your changes

   $ git commit -m "fix #NNN: TITLE"

7) Push your changes

   $ git push
   # It will fail and give you the appropriate commande to use.
   # Retry then the command, something like:
   #     $ git push --set-upstream origin YOUR_BRANCH_NAME

In the output of that command, you will see a link such as:

    remote: Create a pull request for 'feat-no' on GitHub by visiting:
    remote:      https://github.com/BoboTiG/ebook-reader-dict/pull/new/YOUR_BRANCH_NAME

Open that link, it will open the browser to the PR creation, check it is all good and validate.