Open kolatubosun opened 8 years ago
Here is what I can do. In the link that has the Yoruba names, you can copy them as it is in plain UTF-8 characters to a text file, .txt
Step 1: strip non names off each line, it leaves 3 names separated by " " Step 2: isolate each word in each line as an array, [name01, name02, name03] Step 3: get each line of the text file and build up a one dimensional array Step 4: each value of the array must be unique. If an array value appears more than 1 time, other similar values are removed Step 5: get the array of names already in the database table and compare the 2 arrays for differences. Step 6. output this differential array.
As you said, the differential array will also contain non-yoruba names (which would only get larger due to less differences overtime). When this happens, {Lastname, Middlename, Firstname} may be used to determine if the {nameset} is of yoruba origin. It is of course, still not an absolute solution but it will reduce the manual.
Thanks @michael-001 I don't write code, so I'll assume that your description was for others with more competence than me?
Just came across this page that has hundreds of unique Yoruba names. It turns out that WAEC or other admission lists is where you get lots of unique names, because they usually list people's middle names.
In any case, won't it be nice to have something that can tell me which of the names on the page aren't currently in the dictionary. It could be in form of an excel upload thingy. How it might work is that I copy the names, put them in excel, upload it, and the machine gives me an output that has ONLY the names not in the dictionary.
The result, inevitably, will have Hausa, Igbo and Yoruba names. But the Yoruba names will be the distinct ones not in the dictionary. So I can sort out the next step which is simply to remove any other name that is not Yoruba. I can do this manually because it would be hard to train a machine to do this kind of task. After this, I can then re-upload the distinct Yoruba names via our usual Excel spreadsheet.
Doable?