acdh-oeaw / shawi-data

Data of the project "The Shawi-type Arabic dialects (FWF P 33574)".
Other
0 stars 1 forks source link

dict: compounds without space #81

Open rausch-supola opened 2 weeks ago

rausch-supola commented 2 weeks ago

In these items there is a form type lemma subtype compound, where there is no space/blank in the following orth:

xml:id="DShaAr.sid_354" xml:id="DShaAr.sid_74" xml:id="DShaAr.b_dash_gadd_001" xml:id="DShaAr.b_noob_000" xml:id="DShaAr.b_wakit_00000" xml:id="DShaAr.b_sidd_000" xml:id="DShaAr.umm_al_baxat_000"

rausch-supola commented 1 week ago

after adjusting the rule and extending it to hyphens (https://github.com/acdh-oeaw/shawi-data/commit/f9af9bbe6f067345c22c34cd7fdf8468d73b0a7e) only one error is left:

xml:id="DShaAr.sid_354"

VeronikaEngler commented 1 week ago

I fixed the errors for subtype="compound", except the following which have to be checked contentwise: DShaAr.al_000  article al- DShaAr.ha_000  -ha suffix DShaAr.ha_001 hal- demonstrative DShaAr.hin_001 -hin suffix DShaAr.hum_000 -hum suffix ShaAr.i_000 -i suffix DShaAr.l_suffix_00000 l- + suffix DShaAr.megass_ehduum_00001 Klammer DShaAr.miizaan_00001 Hyphen? DShaAr.mterat_ad_dinye_00000 Klammer DShaAr.na_001 -na suffix DShaAr.ta_001 ta- DShaAr.taww_00000 taww- hyphen? DShaAr.u_000 -u suffix DShaAr.yaa_yaa_00000 compound? DShaAr.yaa_yooma_000 compound? DShaAr.yimcjin_suffix_00000 DShaAr.bidd_000 bidd- ? DShaAr.b_mahaari_000 Klammer DShaAr.caban_000 ?? DShaAr.mhayyir_haal_000 hyphen? DShaAr.twilkin_twintin_000 compound or variants? DShaAr.shi_001 DShaAr.cjin_000 pron suffix DShaAr.kum_000 pron suffix DShaAr.icj_000 pron suffix DShaAr.ak_00 pron suffix DShaAr.whad_000 compound? DShaAr.haal_000 haal- compound

rausch-supola commented 3 days ago

I adjusted the rule so that there is no error when the hyphen is at the end or beginning of the string (in case of prefix or suffix - https://github.com/acdh-oeaw/shawi-data/commit/89632157aef42e9f78cfc71256ece3c6fd9b4b79)

errors still occur in these items:

xml:id="DShaAr.l_suffix_00000" xml:id="DShaAr.megass_ehduum_00001" xml:id="DShaAr.mterat_ad_dinye_00000" xml:id="DShaAr.yaa_yaa_00000" xml:id="DShaAr.yimcjin_suffix_00000" xml:id="DShaAr.twilkin_twintin_000" xml:id="DShaAr.yaa_yooma_000"