Open oliverglanz opened 6 months ago
Hi @oliverglanz , Thank you so much for your request which is well-elaborated. Timothy and I will have a look and I will let you know when we start working on this feature. I am sure that we will have some questions also. We will ask them here. Greetings, Ernst
At this moment the MQL query results are being randomized on the BOL site of code with a php shuffle function. The negative side effect is that we always get a democratic representation of the data which results in exercises that do not test well less frequent words, forms, or syntax elements. To prevent such behavior Oliver has used TF searches and sampling the df of query results with the
sample
function of pandas. The suggestion is, that BibleOL will add to its shuffling function a sampling function.A short description of Oliver's routine follows as it might offer a guideline for a implementation of the sampling function in BibleOL (here a video version):
First the BHS data is loaded in TF:
A query is run that has the specific exercise in mind. For example an exercise where 1-Guttural and 1-Aleph verbs are being tested. In this case it is a word base exercise and thus we need all words of the BHS. A TF query is run:
The TF query results are exported as a spreadsheet
The Spreadsheet is loaded as a dataframe:
Cleaning up and Removing difficult forms Now difficult forms are identified and removed. For a detailed overview consult this notebook.
Sampling Now the cleaned-up data is being sampled in such a way that I have a good distribution of verbal forms for person, number, gender, tense, stem, and pronominal suffixes:
Dropping duplicates Although not necessary I always drop duplicates that were created in the sampling process. This does not have to be done since it increases again the "democratic" nature of the data which the sampling procedure sought to reduce.
Exporting the data Now the df is being exported.
Preparing the monad numbers for the BibleOL exercise First I copy the monad_numbers of the exported data into visual studio code to add commas after each number:
Importing monad number to BibleOL Now we import the data into a BibleOL exercise. We do this by first selecting the verbal classes we want:
No a perfectly sampled exercise is ready to be used.