Open BoboTiG opened 3 years ago
I just enhanced the python -m wikiscript --help
output:
eBook Reader Dictionaries
Usage:
wikidict LOCALE
wikidict LOCALE -h, --help
wikidict LOCALE --download
wikidict LOCALE --parse
wikidict LOCALE --render
wikidict LOCALE --convert
wikidict LOCALE --find-templates
wikidict LOCALE --check-random-words N
wikidict LOCALE --check-word=WORD
wikidict LOCALE --get-word=WORD [--raw]
wikidict LOCALE --gen-dict=WORDS --output=FILENAME
wikidict LOCALE --update-release
Options:
--download Retrieve the latest Wiktionary dump into "data/$LOCALE/pages-$DATE.xml".
--parse Parse and store raw Wiktionary data into "data/$LOCALE/data_wikicode-$DATE.json".
--render Render templates from raw data into "data/$LOCALE/data-$DATE.json"
--convert Convert rendered data to working dictionaries into several files:
- "data/$LOCALE/dicthtml-$LOCALE.zip": Kobo format.
- "data/$LOCALE/dict-$LOCALE.df": DictFile format.
--find-templates DEBUG: Find all templates in use.
--check-random-words N Get and render N words.
Then compare with the rendering done on the Wiktionary to catch errors.
--check-word=WORD Get and render WORD.
Then compare with the rendering done on the Wiktionary to catch errors.
--get-word=WORD [--raw] Get and render WORD. Pass --raw to ouput the raw HTML code.
--gen-dict=WORDS DEBUG: Generate the Kobo dictionary for specific words. Pass multiple words
separated with a comma: WORD1,WORD2,WORD3,...
The generated filename can be tweaked via the --output=FILENAME argument.
--update-release DEV: Update the release description. Do not use it manually but via the CI only.
If no argument given, --download, --parse, --render and --convert will be done automatically.
I regenerated html/wikidict/user_functions.html
. It will hopefully be useful to someone :)
cc @atti84it and @Duckbilled ⏫
Here is the email I sent to @atti84it about next steps (after the locale has been added):
First, keep your fork up-to-date. I enhanced the
--help
output.Then, you need to call
--find-templates
, see what files are generated (the script will tell you). Open those files (first, ensure to have a good coverage of sections, so have a look atsections.txt
before the rest) and see if one or several sections are interesting. If yes, updatesections
fromit/__init__.py
(replaceit
byno
or the locale you are working on). Rerun--find-templates
again and see if you are missing sections. Iterate like that until you think your are covering all sections of the language. When sections are done, do the same, but with having a look attemplates.txt
. That is a huge part. There will be a lot ot flase positives also. When there is template, you have to handle it (pick up templates used a lot first I would say). When the template is handled, rerun--find-templates
again, and again, and again :)That is the core of the thing: rendering templates as they are rendered on the Wiktionary. Have a look at
--help
, there are a bunch of arguments to help you (with description).I already created an issue to handle the "Term" template: https://github.com/BoboTiG/ebook-reader-dict/issues/842. This is how we ask for a template support or a modification of an existing one (just check other opened and closes issues for inspiration. Also try to follow the commit description, very simple:
fix #NNN: TITLE
Example:
fix #842: [IT] Support the 'Term' template
All simply :)
For now, I would say:
- focus on finding the right sections, then open a PR with the changes.
- then, open tickets for a template support and work on them when you want (assign issues to yourself to let other know they should not work on it)
And to be less though, you can still open PRs without opening issues first. But try to keep one commit per template with always the same syntax:
[IT] Add support for the 'xxx' template [IT] Better handling of the 'xxx' template [IT] Handle 'xxx' and 'ddd' templates
You get it, start with "[LOCALE]" and just say what template is being impacted by those changes.
About infer the transformation.
There are several ways.
1) Simple italic words: update
templates_italic
. Check the french one for inspiration, or any other locale. 2) Templates with multiple arguments: when they are simple enough to be handled with one line of Python, just updatetemplates_multi
. Again, check the french one for inspiration. For example, https://github.com/BoboTiG/ebook-reader-dict/issues/842 should be handled here. 3) Complex templates: they have several arguments with different possible outputs. For now, open an issue and we will think about it later. I would like you first to "master" cases 1 and 2 which are the most common templates across the Wiktionary.I hope it is clear enough to move forward :) If you need help, to not be shy and open a PR: either Nicolas or me will be happy to help and it will be good for other (new) contrinutors to see those answers.
Here is the email I just sent to @Duckbilled about how to work with git and GitHub:
Here is a typicall workflow, do those actions before working on any PR:
1) You ensure your fork is synchronized:
$ git checkout master $ git pull origin master
2) Create a specifi branch for your patch:
$ git checkout -b fix-NNN # Where NNN is the issue number, if it exists. # Or use impr-no-xxx-tpl or fix-no-xxx-tpl where xxx is the template name.
3) Code.
4) Check your changes:
# Replace "no" with "it" or the locale you are working on: $ python -m pytest --doctest-modules wikidict/lang/no tests/test_no.py --no-cov $ ./check.sh
5) Incorpore our changes:
$ git add -p # Validate modifications you want by typing "y" or "n" to discard. # If you added new files, add them specifically: $ git add tests/data/no/word.wiki # Check no files left without your consent :) : $ git status
6) Commit your changes
$ git commit -m "fix #NNN: TITLE"
7) Push your changes
$ git push # It will fail and give you the appropriate commande to use. # Retry then the command, something like: # $ git push --set-upstream origin YOUR_BRANCH_NAME In the output of that command, you will see a link such as: remote: Create a pull request for 'feat-no' on GitHub by visiting: remote: https://github.com/BoboTiG/ebook-reader-dict/pull/new/YOUR_BRANCH_NAME Open that link, it will open the browser to the PR creation, check it is all good and validate.
Clearly the HOWTO Add a New Local is not well done. It is missing a lot of informations and steps to help new contributors adding a new locale.
It should be revised.
Upvote & Fund