Closed elmurod1202 closed 3 years ago
The new names seem to have extra spaces. Did you check that it compiles before committing?
Oh, my bad, just fixed it. I was quick to push changes as soon as it compiled without an error, not noticing the spaces.
I'm curious about these forms; what are they?
{+ad:ad NP-AL ; !+}
{+al:al NP-AL ; !+}
{+am:am NP-AL ; !+}
{+bei:bei NP-AL ; !+
{+ibn:ibn NP-AL ; !+}
{+la:la NP-AL ; !+}
{+le:le NP-AL ; !+}
{+les:les NP-AL ; !+}
{+lès:lès NP-AL ; !+}
I'm curious about these forms; what are they?
{+ad:ad NP-AL ; !+} {+al:al NP-AL ; !+} {+am:am NP-AL ; !+} {+bei:bei NP-AL ; !+ {+ibn:ibn NP-AL ; !+} {+la:la NP-AL ; !+} {+le:le NP-AL ; !+} {+les:les NP-AL ; !+} {+lès:lès NP-AL ; !+}
Oh, nevermind, these are forms you deleted, right?
Why did you move to U02BC
? Uzbek Wikipedia uses U02BB
, which I thought was the standard? (I agree that the former looks better and is more appropriate, but it appears not to be what's used.)
The roman numerals should probably have their own lexicon (pointed to directly from Root) and not be mixed into the main lexicon.
Forms like this should be able to be generated automatically:
Chernishov:Chernishov NP-COG-MF ; ! "" ! El++
Chernishova:Chernishova NP-COG-MF ; ! "" ! El++
Forms like this appear to be patronymics, which can also be generated for <m>
and <f>
forms automatically:
Cholakovich:Cholakovich NP-COG-MF ; ! "" ! El++
I'm curious about these forms; what are they?
{+ad:ad NP-AL ; !+} {+al:al NP-AL ; !+} {+am:am NP-AL ; !+} {+bei:bei NP-AL ; !+ {+ibn:ibn NP-AL ; !+} {+la:la NP-AL ; !+} {+le:le NP-AL ; !+} {+les:les NP-AL ; !+} {+lès:lès NP-AL ; !+}
These would be parts of NP-ORGS, no?. Why I added them is because, first of all, they exist in turkish.lexc(and all the time I consider tur.lexc as an ideal package and try to replicate what's there), second of all, they help improve the coverage. Was it wrong?(If so, I'll fix them from both Turkish and Uzbek)
Why did you move to
U02BC
? Uzbek Wikipedia usesU02BB
, which I thought was the standard? (I agree that the former looks better and is more appropriate, but it appears not to be what's used.)
The shortest answer is: I didn't. Explanation: The apostrophe in <oʻ> and <gʻ> are U02BB and I converted all to that, there is another apostrophe in the alphabet: "tutuq belgisi"(phonetic glottal stop, <ъ> in cyrillic script) and itʻs U02BC,(Ex: aʼlo, maʼno, eʼzoz) I fixed them as well.
The roman numerals should probably have their own lexicon (pointed to directly from Root) and not be mixed into the main lexicon.
This way of announcing roman numerals was also copied from apertium-tur. Now I see what you(that was you, wasn't?) did in apertium-kaz/tat do deal with it (<(M | D | C | L | X | V | I)+> NUM-ROMAN ;). That looks more appropriate, I'll fix both tur and uzb if you confirm.
I'm curious about these forms; what are they?
{+ad:ad NP-AL ; !+} {+al:al NP-AL ; !+} {+am:am NP-AL ; !+} {+bei:bei NP-AL ; !+ {+ibn:ibn NP-AL ; !+} {+la:la NP-AL ; !+} {+le:le NP-AL ; !+} {+les:les NP-AL ; !+} {+lès:lès NP-AL ; !+}
These would be parts of NP-ORGS, no?. Why I added them is because, first of all, they exist in turkish.lexc(and all the time I consider tur.lexc as an ideal package and try to replicate what's there), second of all, they help improve the coverage. Was it wrong?(If so, I'll fix them from both Turkish and Uzbek)
I'm not sure it's a good idea to always copy other Apertium pairs. tur.lexc
is generally in better shape than uzb.lexc
, but it's far from perfect.
I'm not sure if we need these forms, but it might not hurt. I don't know. Let's not worry about it now, maybe...
Why did you move to
U02BC
? Uzbek Wikipedia usesU02BB
, which I thought was the standard? (I agree that the former looks better and is more appropriate, but it appears not to be what's used.)The shortest answer is: I didn't. Explanation: The apostrophe in <oʻ> and <gʻ> are U02BB and I converted all to that, there is another apostrophe in the alphabet: "tutuq belgisi"(phonetic glottal stop, <ъ> in cyrillic script) and itʻs U02BC,(Ex: aʼlo, maʼno, eʼzoz) I fixed them as well.
Oh! I didn't know these two were meant to be encoded differently! So, how do we know what the right encoding of them is?
The roman numerals should probably have their own lexicon (pointed to directly from Root) and not be mixed into the main lexicon.
This way of announcing roman numerals was also copied from apertium-tur. Now I see what you(that was you, wasn't?) did in apertium-kaz/tat do deal with it (<(M | D | C | L | X | V | I)+> NUM-ROMAN ;). That looks more appropriate, I'll fix both tur and uzb if you confirm.
That was probably @IlnarSelimcan, and I assume that's right and should be okay to copy.
Why did you move to
U02BC
? Uzbek Wikipedia usesU02BB
, which I thought was the standard? (I agree that the former looks better and is more appropriate, but it appears not to be what's used.)The shortest answer is: I didn't. Explanation: The apostrophe in <oʻ> and <gʻ> are U02BB and I converted all to that, there is another apostrophe in the alphabet: "tutuq belgisi"(phonetic glottal stop, <ъ> in cyrillic script) and itʻs U02BC,(Ex: aʼlo, maʼno, eʼzoz) I fixed them as well.
Oh! I didn't know these two were meant to be encoded differently! So, how do we know what the right encoding of them is?
Wiki page on "Uzbek alphabet" says that U02BB is an apostrophe used for <oʻ> and <gʻ>, while U02BC is used for "tutuq belgisi"(aka glottal stop?). I fixed all apostrophes in uzb.lexc in my branch to what they are supposed to be. Yet there is a problem still standing is that due to the lack of proper keyboard layout for Uzbek alphabet, those apostrophes appear in varieties of forms, it has to be solved though (Issue #2).
The roman numerals should probably have their own lexicon (pointed to directly from Root) and not be mixed into the main lexicon.
This way of announcing roman numerals was also copied from apertium-tur. Now I see what you(that was you, wasn't?) did in apertium-kaz/tat do deal with it (<(M | D | C | L | X | V | I)+> NUM-ROMAN ;). That looks more appropriate, I'll fix both tur and uzb if you confirm.
That was probably @IlnarSelimcan, and I assume that's right and should be okay to copy.
This might be problematic for the "entry beginning with whitespace" bug. You might want to use [ ... ]
instead of ( ... )
.
The new names seem to have extra spaces. Did you check that it compiles before committing?