Closed gambhiro closed 10 months ago
See the instructions in the README:
https://github.com/digitalpalidictionary/dpd-db#build-a-complete-database-locally
Isn't poetry run bash bash/initial_setup_run_once Creates config.ini with default parameters?
Or if we need to change something in the default config.ini we can add in the
bash/initial_setup_run_once
Config update: deconstructor - all_texts - yes
And no point to change : make_dpd = no Because it is affecting only make_dict.sh
also no need make_deconstructor - yes, since you anyway run build_db.sh which will generate all sandhi
Please see instruction for config.ini parameters:
https://github.com/digitalpalidictionary/dpd-db/blob/main/dps/relationship.md
I made a bash: build_and_make_all which will create whole db and extract all dictionary for the first time. I also corrected README.
The export formats are a separate concern from building the database, and when the compilation time is so long, exports should not be lumped together with the database.
Could you please separate these tasks? Building the db could be a direct dependency of exporting, i.e. the exporter task can run building the db.
I've seen the .md doc Ven @devamitta, and I'll say it's a start, but indeed it's a relationship and it's complicated. I make notes like this all the time too. At this point it is like condensed notes to yourself, not an explanation to someone else.
Regardless, it shouldn't be necessary to read about config options to run common build operations.
Are you familiar with task runners? This is what they are for, i.e. to specify (scripting) the steps for build targets, and the steps for building their dependent files or other conditions. I've mentioned using one or another to Ven @bdhrs
Such as GNU make
and it's Makefile, but make
has such obscure behaviours that I would avoid it for anything complex.
'doit' it Python-based, it seems popular and the syntax looks readable: https://github.com/pydoit/doit
It can execute shell commands or Python functions as actions: https://pydoit.org/tasks.html#actions
That would lend itself well to your present workflow, and allow specifying what build tasks there are, creating blocks of steps to compose the more complex tasks, what depends on what, running checks before the actions, etc.
Thanks for volunteering to set up a task runner, bhante.
Please make one for your specific requirements of building the db with all deconstructed sandhi compounds, and the associated config.ini settings. If it's easily readable, and gives better results than good old bash scripts then we can set up some more runners for other common situations.
I mostly run the same process daily, but occasionally need to activate some different config settings. I'm sure @Devamitta has a similar workflow.
Thanks for volunteering to set up a task runner, bhante.
I didn't say that, it's been my experience to not mess with your workflow 😁 And again I'm concerned that if I write it, you won't be able to adapt it for the contexts that are currently habitual for you and Ven Devamitta.
If you have specific "how do I ..." questions, that's probably where I can help.
The machine is also good at providing a head start without having to read much docs:
Using the doit Python task management tool, how to add... https://www.phind.com/search?cache=k6tbbf5lu6ixx7q2thb7jy6c
We are currently able to perform all these tasks by running various bash scripts. I don't understand why we should create something that serves the same purpose. Ajahn @gambhiro can you explain the benefits of investing time in something without additional functionality?
@gambhiro It's been the standard procedure on the main work of DPD, the Pāḷi data, to treat all complaints and criticism on the feedback forms as offers of help. This largely accounts for some of the rapid progress in specific areas of the Pāḷi database. Perhaps coders are a little more cagey than the linguists and meditators.
Are the latest updates in the bash scripts folder comprehensible? There's really not much more required than that.
We tried the approach that "I will quickly code it for you", and I don't think the results of that protocol were satisfactory for either of us.
I am offering to help, but instead of coding, now I am doing more talking. If the solutions I am suggesting aren't suitable for you, that's actually a useful result, don't you think? It's better than having coded it without discussion.
I raise the confusion and errors I am running into. Ven. @Devamitta responded and tidied up makdedict.sh and db build scripts. Even though it's just moving around some lines of bash, but I shouldn't have attempted it because I don't have the context information for it.
Regarding build tasks, the abstract purpose of a task runner is organization and clarification (and a sort of documentation).
You probably encountered the classics: make build
, make clean
, make install
Often a project will have a handful of build targets. These collect and organize perhaps dozens of steps into purposeful end tasks. (but I'm not suggesting to use GNU make
)
Currently there are several scripts in bash/
and scripts/
(why two folders anyway? are there other build scripts?) It is terribly confusing how to use this, which are build scripts, what are prep steps, etc. It also makes debugging hard. (Is an error happening b/c this isn't the right script? Or it needs a config variable? Or a prep-step before it?)
Now, you could do this with bash only, for example:
tasks/
foldertasks/deps/
folder, all the prep and config etc scripts.Having accumulated a bunch of build scripts, some level of organization would be helpful not only to make it comprehensible how to use parts of the system, but also how to add new targets or adapt parts of it.
The other thing that stood out for me is that the bash scripts are mostly a list of Python scripts. You are not doing much "bash scripting" in them.
That would lend itself to converting to .py scripts, where in main()
you import the main()
of the other .py scripts, or more specific task-related functions.
The added benefit is the shared environment: import and run setup functions only once, create config options, pass along sth like a limit items variable, pass inputs and outputs between build steps as function arguments, no ambiguity about poetry run ...
or not, VS Code would jump between functions as usual, etc.
Anyway, food for thought.
@bdhrs @Devamitta the build_and_make_all.sh
script is a helpful example, thank you. It looks like I will have to cook up my own version but it's instructive to see the context for it.
Re: task runners, if you don't see the benefit then that's what we learnt from the discussion, in other words you determined that it is out of scope.
As for me, I think life is too short for debugging two things: vanilla Javascript (undefined is not a function
for anyone?) and bash. set -e
is kinda neat until it isn't.
Say, you want to remove *.tar.gz
files if they exist? If they don't exist, nothing to do, move on. But rm will exit with 1 and the build script stops. StackOverflow revealed that it's as simple as:
if stat -t "$RELEASE_DIR"/*.tar.bz2 >/dev/null 2>&1; then rm "$RELEASE_DIR"/*.tar.bz2; fi
I'm looking at xonsh to rewrite my build script, inline Python in the shell would be neat.
Or the more radical nushell looks good, it actually knows what a list is.
Starting with a clone of the tip:
What is the next step? There are several scripts with "sandhi" in them. Is there a specific order? Command line options? config.ini settings?
Is
bash/build_db.sh
necessary at this point?