Alternate Conjugation/Declension Autogen

ElsaTheHobo commented 6 years ago

Enhancement: Allow an alternate method of autogeneration of conjugations and declensions. Instead of one rule for every combination of word form, allow rules based on each separate conjugation dimension.

Example: A verb with the conjugations Person (1st, 2nd, 3rd), Number (Singular, Plural), and Tense (Past, Present, Future). Currently, the autogeneration dialogue has you set them up by inputting separate rules for every combination of word form.

My suggestion allows for an alternate method of forming words for agglutinative languages: a separate rule is made for each dimension, such as:

If person = 1st, $ > a If person = 2nd, $ > e If person = 3rd, $ > i If number = sing, [do nothing] If number = plural, $ > s If tense = past, $ > an If tense = present, $ > un If tense = future, ^ > will

This allows for easy and less time-consuming methods of generating agglutinative word forms, since you don't have to continuously copy and paste rules into each box. You could also change the order of rules, in case you wanted the plural marker to go in front of the person marker, and add in more rules as well, possibly with more complex if statements (such as if x = y and z = a as one rule), or rules that always apply, or are based on a filter rather than it's conjugation. This would save the most time with very complex words, such as verbs that have both a subject and object marker, honorific forms, plural markers, more than 3 tenses, etc etc, so that you don't have to write dozens of regex rules.

DraqueT commented 6 years ago

Heyo, thank you for the suggestion! Something like this has been on my mind since v 1.0 of PolyGlot, and it's something I eventually want to do. Unfortunately, the declension system of PolyGlot is by far the most complex element of it, and when I do this, it will be a massive rehaul.

I'll need to not only create the new system, but make a bridge that is able to interpret everything from the old and translate it up. So long story short, this one is coming, but it might be a little ways off.

Hope that PolyGlot is working well for you otherwise, though! :3

DraqueT commented 6 years ago

Rolling into pre-existing autoconjugation enhancement ticket. Closing this one.

scottgit commented 6 years ago

What "pre-existing autoconjugation enhancement" is this rolled into? I ask, because I need this functionality (or see below). I currently have a language that makes small, but very regular (no exceptions), changes across multiple conjugation values for a verb, but the problem is multiplication. That is, I have:

2 forms for voice
3 forms for honor
3 forms for number
5 forms for tense
5 forms for a combined person/gender
6 forms for aspect
11 forms for mood (though most of those simply signal tonal changes, they are still indicated in written form)

So currently each verb has 29,700 possible forms when all the possibilities are considered, yet only 28 actual, regular changes that need to be made (there are 35 forms, but the base form of the verb includes 1 type of each of the conjugation values, so only 28 changes are needed). I think you can see that even using a copy of conjugation rules for the autogen feature would be overwhelming, while if I could define the words by the 28 actual changes, then it is quite manageable.

Alternatively, if I could at least filter the list on the left by conjugation values, and were able to define one change for one value and then select all (i.e. all shown in the filtered list) and copy that defined change to all the forms that contain that value, I could still effectively make only 28 changes fairly quickly and effectively. So if I could filter for the passive voice, define the change on one, and then have the ability to select all in that filtered list and copy the rule to all the forms that are passive voice, then it would be almost as efficient as being able to define a change based purely on conjugation value. But having to use the current method is nearly unfeasible.

DraqueT commented 6 years ago

You should make use of non-dimensional declension forms. When you create them as non-dimensional, they will exist independently from any other forms and not be combined with others to create their functional wordforms.

There is a checkbox on the conjugation/declension setup screen to make any given declension entry non-dimensional.

scottgit commented 6 years ago

Your solution does not really make sense to me. I have a "mood" (as one example) with 11 possible dimensions (values) to it, but you are saying it should be made dimensionless? Are you saying I should define each dimension of the mood as its own unique conjugation and make each of those dimensionless? I am unclear as to how that would help (if it would).

And the word forms do combine; that is, there are markings that are added for each distinct value of the dimensions to the 7 conjugation points noted above, so it is not the same as a gerund or such in that respect, so that is why I do not see how a dimensionless form relates/helps. Those additions are regular, depending on conjugation and dimension value, but not exclusive to one another.

scottgit commented 6 years ago

@DraqueT Just giving a "bump" to try to get an explanation on your proposed solution of using non-dimensional settings, when my verb declension is 7 dimensions. I'm sure you are busy, but since it has been about a month, I thought I would give a nudge.

DraqueT commented 6 years ago

Heyo, sorry to leave you hanging on here so long! This is one that I feel like we aren't on the same page about, and I've dragged my feet a bit on tackling it. Tell me if I'm understanding this correctly.

Ultimately, there are 29,700 possible variations for any given word's final, conjugated form. Clearly that's vastly more forms that it's possible or reasonable to define individual rules to cover.

However, there are only 35 individual traits that can be applied, and you would like to be able to associate rules with these to allow for the generation of every possible form with a reasonable amount of rule creation.

If this is the case, I need to put a lot of thought into how this might be addressed. The entire conjugation/declension system is currently based on the presupposition that different forms may have completely different rules of conjugation, rather than a the more structured system that you're describing there. If I implement something like this, I'll have to figure out a way to dovetail these so that it doesn't simply end up as two completely separate systems.

I might be able to implement something that will generate the larger rulesets from smaller ones...

Let me know if that covers what you're thinking of!

scottgit commented 6 years ago

@DraqueT

Yes, you have the correct understanding.

This initial issue here was asking to be able to make rules based on single dimensions (rather than the "set" of dimensions), and that seems to be what I would need here. That way, I define 35 rules (my 7 conjugating factors, with however many possible variations for each, which in my case adds up to 35) and it generates the regularly calculated 29,700 variant possibilities.

As to dovetailing, consider this. Here is your autogen dialog presently:

Instead of listing the full conjugation/declension forms on the left, you just list the possible values (categorized). So in this case:

tense
- past
- present
- future
certainty
- certain
- uncertain
positivity
- positive
- negative

Then the user can select 0 or 1 of any category to define a ruleset for. So if I wanted to define only a rule for Past Tense, I select that and define the rule. I repeat for each/any that may need defined. Whatever rule is left undefined assumes that the base form of the word is that rule's form (i.e. no change to the base form).

But if multiple rules were always related in some regular way, for example Past Tense takes one form when Certain, and another when Uncertain, then I select both Past Tense + Certain (at the same time) to define the rule for that combination, and then Past Tense + Uncertain to define the other.

Any conjugation where at least two possible values are left without a rule definition to distinguish it from other(s) might be highlighted to indicate something ought to be defined (so that something distinguishes between them). So in my example here, if I had defined the two past tense forms noted with certainty and uncertainty, then Present/Future might be highlighted in a red font (at least one of the two needs definition to distinguish from the other), likewise for Positive/Negative (since nothing has been defined at all for those), and since Certain/Uncertain are only defined related to Past Tense, then those might be highlighted in a dark yellow, blue, or some other color to indicate they have been partially distinguished (i.e. in the case of Past Tense only), and need some further rules, whether a regular rule (i.e. a a rule that applies for Certain alone, whether it is Present or Future) or combined (rules where the two tenses remaining to be defined combine with the certainty values to make the distinguishing values). [EDIT: These highlights would just be "warnings," as it is possible a language might have ambiguity in some cases, so that maybe word form for certain/uncertain is the same for present and future, and only context tells a hearer/reader what the certainty is in these tenses ; or nothing tells about certainty in those cases, and it is left ambiguous. The point is, flexibility in leaving rules "blank" to not make any changes is good, but having a "warning" system like highlighting in case where a user did not realize they had not set up a rule is helpful.]

By default, the more specific rules will override the less specific. So if I define rules simply based on tense alone for transformation, but then also define one combined rule (like past tense + certainty), the more specific tense + certainty rule would be applied in place of the mere tense rule in cases where the two (tense + certainty) matched as opposed to just the one (tense). So there would be a "rule specificity" order first, then the "rule set" order for that declension value(s) combination, and then the "transformation" order that currently occurs within rules.

I don't know if the "rules" ordering (the middle column in the example image above) would be "valid" as an ordering mechanism for the specificity noted above: why would a more specific rule ever be overruled by a less specific? But I think you still need that column, because if I understand it correctly, that relates to the left column. So in my example, my singular Past Tense may have a regular change that occurs when a word ends in "a," but a different change if it ends in "b," and so I need the ability to define multiple word form option rules for any combination of left column defined items. Then the actual transformation rules of the right column apply to the middle column specific rules.

So I might define in my columns (asterisks represent a "selection"; note the middle column and right column have sub headings showing the set of rules selected to their left: so if I were defining Past Certian, that would be listed under "Rules" instead of just "Past" in this example):

Values      Rules             Transformations
Tense       Past              Rule for B end
  *Past*    Rule for A end    AB$  BAD
  Present   *Rule for B end*  EB$  BED
  Future                      B$   BOD

So based on tense alone, if it is past, then above shows the definition for when it ends in B, with some changes if the vowels before the B are certain letters (in this case, repeats the A or E vowels instead of just adding OD). The "Rule for A end" set would take precedence over the "Rule for B end" (as the rule orders do now), possibly compounding word transformation, and the Transformation order would still have its precedence within the each rules set. In short, this all functions similar to how you currently have it. The change is in the left column.

So now, back to the dovetailing. Doing it this way, when you update the software to accommodate for previously defined autogeneration rulesets, those old rules simply get translated over to very specific rule combination sets at the left. So in your image example above, you would just translate the rules over for "past certain positive" by having the three rules selected at left in the new form, carry the names for that rule over, and carry the transformation rules over. In short, it would be as if in the new system, a person had went through and defined every possible combination of rules into an explicit set by selecting one value from each group. (That may not be the most "efficient" way that the old font could have defined its rules, but it should make it seemless in translating the old system to the new).

P.S. I guess you might need some way to define left column rules ordering (that is, do you apply the rules for Tense first or Certianty or Positivity, in cases where each has been defined separately as rules). So you might need to then add a "group" ordering ability to the left column (the up/down arrows to reorder the group orders) so that rules get applied by that.

scottgit commented 6 years ago

Let me additionally add to the above another functionality need related to this, that in the window that actually generates the various conjugations (the product of those rules), it would be great to have a filter system (selecting the same values as the left column in the proposal above) so that one could filter through the list of all the conjugations and find the one form being sought. So in my case, with 29,700 generated items, I would want to be able to select the seven dimension values and have it give me the one form that I was seeking at that moment. This makes the final listing more useful. (Or I could have it show me all the past tense forms, if that is the only thing I select to filter on.)

EmperorOzy commented 1 year ago

Sorry to necropost, but I'm also wanting functionality similar to what's proposed above.

DraqueT / PolyGlot

Alternate Conjugation/Declension Autogen #514