Add suffix to string type to perform case-sensitive comparisons

mgalyean commented 3 years ago

This is in reference to =,>,<,<=,>= only, not suffixes like CONTAINS, STARTSWITH, or filenames, or lexicons keys etc. The docs note that the functionality to adjust case sensitivity of string comparisons is planned but I could not find an existing issue (searching on 'case' and 'sensitivity' to bump as it is clearly the intuitive and desired behavior. If an issue already exists, mea culpa

nuggreat commented 3 years ago

As far as I know there only place where you can adjust the functionality of a string comparison is for lexicons using the suffix :CASESENSITIVE the only other note I know about testing for case sensitivity is in the string documentation and it notes you would have to do it manually using UNCHAR()

Also of note is that due to the case sensitive nature of paths you can use them to backdoor into an case sensitive equality check

LOCAL p1 IS PATH("abc").
LOCAL p2 IS PATH("ABC").
LOCAL p3 IS PATH("abc").
PRINT p1 = p2.//prints false
PRINT p1 = p3.//prints true

mgalyean commented 3 years ago

Right, but as I pointed out, the docs also mention that a fix is in the works for doing case-sensitive string comparisons (along with mentioning the unchar() workaround which I'm currently using). I didn't see an issue for that so I put one in. I can understand why lexes and other structures that use keys could logically be seen as having case-insensitivity as a default (with scripter override option). But I can't see a case where string comparisons would logically be case-insensitive by default; much without a way to override it

nuggreat commented 3 years ago

The reason for case-insensitive by default has to do with string comparisons involving various KSP strings. Things like SHIP:STATUS, resource names, or sas mode. This is because for people new to programing there intuition is that a change in case doesn't actually make a string different.

Also some what wondering where you saw the mentions of a pending case sensitive option as I don't recall that within the documentation and my quick search didn't turn it up.

mgalyean commented 3 years ago

I'm actually not confused by the case-insensitivity of things like that where the string value is pivotal for many internals. As pointed out, I'm only referring to user string vars compared to string literals. As for the docs, I searched for 'case' in the docs search widget, then clicked on the "String" link, then browser searched in the doc for 'case'. https://ksp-kos.github.io/KOS_DOC/structures/misc/string.html?highlight=case

mgalyean commented 3 years ago

Also, while case-sensitivity may have been an issue with newbies in the 80s or 90s, I really don't think that is a big deal anymore since everyone knows all about that via passwords on all their social media

Dunbaratu commented 3 years ago

For the record, I was against making string = case insensitive, but was the lone voice objecting at the time when we had 3 active devs, so I backed down. The argument people were using for it was that case insensitivity would be more newbie-friendly and fit people's expectations given that kerboscript identifiers are case-insensitive (anticipating the confusion, Why are myvar and MyVar the same thing but "myvalue" and "MyValue aren't?).

The argument against it was that every language has different rules about which pairs of chars are the upper and lowercase versions of each other, and whether or not a word's capitalization changes it (i.e German nouns). For any case-insensitive comparison in C# you have to pick the "culture" to tell C# which capitalization rules are in effect as you do it, so we'd have to hardcode a choice for that (we picked CultureInvariant, which means it ignores your computer's local language settings so scripts are sharable without surprise changes when run on someone else's computer) At the time this argument (that we're making it English-centric) wasn't as important because I hadn't yet made the changes for the terminal to support any monospace Unicode font you feel like using, so it could only display the 7-bit Ascii chars we had hand-drawn in a texture anyway, so operating in non-English languages wasn't that feasible.

But once inserted, it cannot be changed later without breaking everyone's existing scripts that are already written to expect "A" = "a" to be true. If case sensitive comparing is added, it has to be an option that's off by default. And you're literally the first person ever to have complained about it. That lack of complaint is why I never bothered following up on doing it. I thought, "Well I must be alone in thinking this was a problem, I guess."

The clean change would be a global option you can choose for case-insensitivity that automatically kicks in on every string compare, but I think that would be a nightmare to prove nothing broke. (Prove there's no place where we ended up depending on the case-insensitivity of all string compares somewhere). The easier (but possibly less cleaner) change would be to add suffixes to StringValue like s1:equalcase(s2), s1:greatercase(s2), etc, and leave the operators =, <, <=, >, >=, <> alone.

mgalyean commented 3 years ago

I was hoping that only the string literal comparison ops could be defaulted to sensitive with a suffix to make them insensitive. A global setting is probably a bridge too far.

As for confusing newbie programmers; I think that most HS students have had at least a few brushes with programming these days, if not in school then at home, and if they haven't then I'm not sure it a good idea to spoon feed them paradigms that are off the beaten path. Is it not more confusing to not differentiate between code and data early on? (keywords versus data in variables?) A false sense that the case-insensitivity of keywords is integral to data having to be case insensitive could definitely cause confusion for them later in the real, non-kOS, world. I think there is a point where the training wheels train the wrong thing and eventually the trainees become activists/politicians and want all bicycles to require training wheels, ha. IIRC, the first thing taught in my freshman programming courses was the concept of code as opposed to data, or very near that. I guess I just don't understand why when even all the educational programming languages out there don't make string data case insensitive that kOS goes against the grain on the basis that it is "less confusing". Against the grain is quite confusing

Dunbaratu commented 3 years ago

only the string literal comparison ops

bool1 and bool2 will always have the exact same result as each other in the example below:

set x to "val".
set y to "VAL".
set bool1 to "val" > "VAL".
set bool2 to x > y.

There is no such thing as a different operator for literal strings than for variables containing strings. They're the same operator as long as the type of the thing on the left and the right is StringValue, then they're the same operator.

mgalyean commented 3 years ago

Yes, I miswrote that. I didn't mean just literals. I meant user string var comparisons, including literals. And if only one person brings something up it doesn't necessarily mean others don't care. It could mean they have had negative experiences giving feedback in the past so are hesitant to do so. Try reporting bugs to youtube or facebook or apple (pick some corp with unilateral communications) and getting any resolution for example

Dunbaratu commented 3 years ago

Okay, now that you cleared up what you meant, by this, let me respond:

I was hoping that only the string literal comparison ops could be defaulted to sensitive with a suffix to make them insensitive.

Provide me a time machine first and then I will go back to when the decision was made and argue harder. (By the way, stop beating the dead horse arguing a point that I already TOLD you I agree with but was overruled on when the decision was made.)

I no longer think the idea of doing it with suffixes is going to be workable. It's going to have to be a global setting, even if that makes it take more work to test out.

Now, without a time machine, that leaves these options for what to do now:

(A) User has no choice: All StringValue compares stay as they are, case-insensitive. (B) User has no choice: All StringValue compares change to become case-sensitive. (C) Opt-in to case-sensitivity: Default is still the old way but a script could change a global setting to get the new way. (C) Opt-out of case-sensitivity: Default becomes the new way but a script could change a global setting to get the old way.

I don't think either one of us wants (A) or (B).

You seem to want (D), where I would prefer (C), and here's an example why. Lots of scripts have this:

if ship:status = "landed"

Lots of other scripts have this:

if ship:status = "LANDED"

ship:status is not a user variable, but the = here is still a StringValue compare even though ship:status is not a user's variable. That value comes from KSP's API. Specifically it's the enum value VesselStatuses.LANDED which kOS converts with C#'s default ToString() for enums (which in this case makes a string in all caps "LANDED" because the identifier is in all caps) and then that becomes a kOS StringValue when it's returned to the script.

So option (C) would continue to let that script work as-is. Option (D) means that script needs to be edited or it now fails.

If case-sensitivity is added, then regardless of whether it's done through option (B), (C), or (D), another thing to ask is whether it makes sense for case-insensitive lexicon keys to keep being a setting independent of case-insensitive everything else. Because that does lead to combinations where one is sensitive and the other isn't, so that "key" = "KEY" is false while mylex["key"] andmylex["KEY"]`` are the same. (This isn't necessarily wrong. I'm just pointing it out as a thing the documentation would have to explain.)

Dunbaratu commented 3 years ago

One disagreement I had with lexicon keys being case-insensitive is that that rather assumes lexicon keys are always strings, and they're not. It's an associative array where the key can be any type, since all types have a hashcode generator for them by default, they all work. But we've got code in there that does a special case check for if the type of the key is a string, and if so it uses a different comparator from the default one.

But that's water under the bridge. It doesn't matter if I wanted it that way, it is that way and has been long enough that backward compatibility has to be accounted for in any attempts to change it.

mgalyean commented 3 years ago

That is a great point. I've had to mung in the unchar() workaround in my code that converts UIDs to base62 because of the insensitivity as I use the result as lex keys :(

mgalyean commented 3 years ago

Also, I apologize if it came off like I was berating you, quite the opposite intent. I may have been berating the issue though, mea culpa

JonnyOThan commented 3 years ago

I'm curious about the use case here. I think your 2nd-last comment has a hint: you're doing base62 encoding? That's an incredibly specialized use case that maybe shouldn't be weighted very heavily for a core design aspect of the language. Especially since there are workarounds that you can put behind functions that are probably going to be better suited for your domain anyway.

It would be exceptionally strange if KOS's syntax is case-insensitive while string comparisons aren't. It's nice to live in a world where you know that case doesn't matter 99.9% of the time (file paths notwithstanding).

mgalyean commented 3 years ago

It isn't just about the base62. It is about data in a string being data, and the less restrictions on data the better. Many programming techniques for languages that don't have good blob support, like kerboscript, leverage strings for raw data storage, so the case I mentioned is just the tip of the iceberg.

However, I've been carefully pondering the counterarguments dunbaratu brought up and agree with all his points about backward compatibility and the issues of making the operators work differently in different cases and realized I don't want to redefine the operators but rather to simply to do a case-sensitive comparison without having to iterate with unchar(). Maybe a string function like RAWCOMPARE(OP) where OP is a string form of the desired op; ">", "<="...etc. Yes, one could implement this in a script function, but it really seems like something that should have under the hood support given the potential very high IPU cost of unchar() on a large string "blob" when a "real" computer on a spacecraft would likely have no issues dealing with binary blobs. So normal comparisons would work just as everyone is familiar with in kOS, but there would be an ability to treat strings as the full-range data they truly are.

Now if strings are actually stored in ALLCAPS, then yes, that would be problematic, but since I can print "Hello World" without it being in ALLCAPS I don't think storage is the issue here, let me know if i'm off base there

JonnyOThan commented 3 years ago

Isn't the solution to this problem "KOS should have a binary blob type," not "string comparisons should be case sensitive?"

mgalyean commented 3 years ago

I've asked for that along with bit operators and got a lot of pushback. But yeah, it should

mgalyean commented 3 years ago

I just wrote that I didn't want to redefine the current operators. Just an under the hood ability to do case-sensitive comparisons. The push-back seems a bit over the top

JonnyOThan commented 3 years ago

I'll admit I only skimmed the discussion above, but I strongly disagree with the issue title - it seems the proposal has changed slightly and maybe the title should be updated as well.

I think case-sensitive suffixes for comparisons on the string type would be appropriate, and a binary blob type would be appropriate as well.

I'm curious what you're doing with UID/base62 encoding. Perhaps a different encoding that doesn't use both upper and lowercase characters might work better?

mgalyean commented 3 years ago

The base62 is to get the shortest form of a UID feasible so I can append it to tag and craft names and such without it overflowing and being hidden from sight. I use it mostly to auto-ID cores so their terminal titles are unique, craft names of stages that end up in orbit that would otherwise end up with something more vague and less memorable. I will change the title of the issue.

KSP-KOS / KOS

Add suffix to string type to perform case-sensitive comparisons #2930