Closed unhammer closed 5 years ago
So if I understand correctly, the example sentences are in <x>
and <xt>
pairs. I don’t understand these languages, but I can see a few problems.
# missing initial capital
grep -r '<xt\?>[^[:upper:]]' dicts/smenob/src
grep -r '<xt\?>[^[:upper:]]' dicts/smanob/src
grep -r '<xt\?>[^[:upper:]]' dicts/fkvnob/src/
# annotations
grep -r '<xt\?>' dicts/smenob/src | grep '('
grep -r '<xt\?>' dicts/smanob/src | grep '('
So the sentences would probably need some manual review first.
gillux notifications@github.com writes:
missing initial capital
grep -r '<xt\?>[^[:upper:]]' dicts/smenob/src grep -r '<xt\?>[^[:upper:]]' dicts/smanob/src grep -r '<xt\?>[^[:upper:]]' dicts/fkvnob/src/
As an example,
dicts/smenob/src/V_smenob.xml: <x>dodjalit earáide ovddasvástádusa</x>
dicts/smenob/src/V_smenob.xml: <xt>skyve ansvaret over på andre</xt>
means "push the responsibility on to someone else", so if you want complete sentences you'd have to grep -v those yeah.
annotations
grep -r '<xt\?>' dicts/smenob/src | grep '(' grep -r '<xt\?>' dicts/smanob/src | grep '('
dicts/smenob/src/Adv_smenob.xml: <xt>Hvor (på kroppen) er du blitt operert?</xt>
means "Where (on the body) have you been operated?"
So the sentences would probably need some manual review first.
As always, I hope? :)
We can take care of implementing the necessary features for people to mass import sentences but we cannot take care of extracting and curating sentences from a dictionary or any linguistic source.
Closing this now.
http://giellatekno.uit.no/words/dicts/dict-stardict.eng.html has some dictionaries of
that might be mass-added to tatoeba. The svn url is https://victorio.uit.no/langtech/trunk/words/dicts (subdirs smenob, smanob and fkvnob), license http://creativecommons.org/licenses/by/3.0/no/deed.en
It's probably easier to get the examples from the xml in SVN than the stardict files.