Refactoring the Generate() function

ApprenticeofEnder commented 9 months ago

So there's a giant switch statement in the Generate() function.

I think we can refactor this. I'll see if I can open a PR and clean it up, but the idea is to have the different "blocks" in a data structure that better represents what we're after. Also, since the glyphs are all strings, we could leverage that in building the data structure.

ApprenticeofEnder commented 9 months ago

Current plan to hack this together is to create a series of lists of the different sets of characters/strings where each list can be toggled on/off, and then leverage the InsertGlyphFromArray function.

A BETTER solution might be to use something like a HashMap and then have each set of characters/strings be mapped to a meaningful key. However, most password requirements are based on ASCII characters anyways, so we could also just have an option to opt in to the full Unicode and Emoji list, or just use ASCII characters.

Actually, I think it might be better to have two sets of generators: ASCII and Extended, which combines the Unicode and Emoji generators. So, for instance, the password requirements could be fulfilled by randomly specific types of characters for each position, which we can map to numbers.

Let's say we have a password which is 10 characters long.

For each character, we have a mapping which represents a number to a character set. So for instance:

0 = Lowercase alphabet
1 = Uppercase alphabet
2 = Number
3 = Special Character
4 = Unicode Chart Character
5 = Unicode Variant
6 = Emoji
7 = Emoji Modifier

Then for each character position in the password, we randomly generate a number from 0 to 7. So this could end up being something like 2604013763.

Looking at our mapping, that's:

Number
Emoji
Lowercase Letter
Unicode Chart
Lowercase Letter
Uppercase Letter
Special Character
Emoji Modifier
Emoji
Special Character

Then for each character position we generated, we take a number out of the respective character set, string them all together, and then we have our password. Of course, some considerations are necessary for missed requirements, etc. etc.

Nathan2076 commented 9 months ago

That's an interesting idea. My initial concept was to save the Unicode blocks the user unchecked and iterate through them to see if the generated chosenBlock number corresponded to one of them. If so, it would continue and generate another one.

But your approach is also very good, because that way the user can not only opt out of selected Unicode blocks, but also Unicode entirely. I imagine the code would be something like this?

Password.cs:

public static string Generate(int passwordLength)
{
    string password = "";

    for (var i = 0; i < passwordLength; i++)
    {
        string glyph = "";
        int chosenPool = RandomNumberGenerator.GetInt32(8);

        switch (chosenPool)
        {
            case 0:
                glyph += InsertLowercaseChar();
                break;
            case 1:
                glyph += InsertUppercaseChar();
                break;
            case 2:
                glyph += InsertNumber();
                break;
            case 3:
                glyph += InsertSpecialChar();
                break;
            case 4:
                glyph += InsertUnicodeGlyph();
                break;
            case 5:
                glyph += InsertUnicodeVariant();
                break;
            case 6:
                glyph += InsertEmoji();
                break;
            case 7:
                glyph += InsertEmojiModifier();
                break;
        }
    }

    return password;
}

private static char InsertLowercaseChar()
{
    readonly char[] chars = "abcdefghijklmnopqrstuvwxyz".ToCharArray();

    return char[RandomNumberGenerator.GetInt32(chars.Length)];
}

private static char InsertUppercaseChar()
{
    readonly char[] chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ".ToCharArray();

    return char[RandomNumberGenerator.GetInt32(chars.Length)];
}

And so on…

Maybe in the same file, maybe in separate files? Perhaps define the upper/lower characters and numbers in a file named Chars.cs, and the function would only access Chars.uppercase/Chars.lowercase/Chars.numbers, and choose a random character, without needing to initialize them in the function body?

There are still some things to think about, but overall this is an interesting idea.

ApprenticeofEnder commented 9 months ago

Well, when in doubt look at how the other password managers do it.

Bitwarden seems to do some stuff involving adding positions and then shuffling them, substituting out character sets as they come. With a little more separation of concerns considering the scope of the character sets here, this combined with putting the characters into their own class could work in our favour.

Nathan2076 commented 9 months ago

Ooh, that's a very good observation! I haven't thought about shuffling, that's neat.

I'll take a look at some other open-source projects and see if they use other methods too, like that one. Thanks for the tip!

Nathan2076 commented 8 months ago

Hello, sorry for the late update, I was thinking about new features to implement and working on another project.

Anyway, does c576f88 satisfy the issue? I've also realized that this approach gives significant performance improvements, since not everything is in a giant switch statement. Thanks for the help!

Nathan2076 commented 7 months ago

Due to inactivity, I will be merging feature/selectors to develop and closing this issue, so that features and performance improvements get ready to be publicly released sooner.

If there are any concerns or questions that may arise in the future during development, feel free to open another issue. Thank you so much for your help! It's very good to see that someone believes in this project and is able to contribute to it and to a more private and secure Internet. ❤️

ApprenticeofEnder commented 7 months ago

Hey, sorry about the radio silence, I got completely sidetracked with school: semester's wrapping up and I've still got a fair bit on my plate.

It's definitely an improvement, however the massive switch statement with InsertUnicodeGlyph could probably still be trimmed down.

Nathan2076 commented 7 months ago

Oh, I'm sorry! I should've considered you were busy, mainly with academics, that was callous of my part.

I think I will separate the Unicode glyph insertion in various groups, like how the Character Code Charts page does, what do you think? So, I'll work on that and, when I finish it, I'll give you a heads-up. No need to worry about answering quickly or checking in, focus on your studies! I'll be working on other things in the meantime.

All the luck man, you can do it!

ApprenticeofEnder commented 7 months ago

This could work, the main concern is how you're going to store the information. Remember, the problem is you have a massive switch statement that iterates through what feasibly could be a collection in and of itself. I'll leave the digging to you!

And thanks! I'm in the last leg of writing my Honour's Thesis, so fingers crossed it goes well.

ApprenticeofEnder commented 6 months ago

Hey, back to check in on this.

My main concern at this point is the current way you have the unicode character classes, etc. set up is still kind of shoehorning you into these massive switch statements.

I think what you could do in theory is a sort of 2D collection. Alternatively, is there a nuget package you could use that could be easier? Heck, randomly generating Unicode could even work. It would have the same effect. Just pick your byte count, randomly generate each byte (following the Unicode rules obvi), and go.

Definitely some stuff to play around with. You're less concerned with regards to what the actual characters are because you want that entropy. If we need to worry about unicode limits on certain websites, I think you can just have the user switch to ASCII passwords.

Say we have a website right? We do two generations: One pure balls to the Unicode/Emoji password, and one that's closer to a standard password. The user sets the parameters for the latter, and the former is just whatever happens, happens.

Let me know what you think.

ApprenticeofEnder commented 6 months ago

Hey, back to check in on this.

My main concern at this point is the current way you have the unicode character classes, etc. set up is still kind of shoehorning you into these massive switch statements.

I think what you could do in theory is a sort of 2D collection. Alternatively, is there a nuget package you could use that could be easier? Heck, randomly generating Unicode could even work. It would have the same effect. Just pick your byte count, randomly generate each byte and go.

Definitely some stuff to play around with. You're less concerned with regards to what the actual characters are because you want that entropy. If we need to worry about unicode limits on certain websites, I think you can just have the user switch to ASCII passwords.

Say we have a website right? We do two generations: One pure balls to the Unicode/Emoji password, and one that's closer to a standard password. The user sets the parameters for the latter, and the former is just whatever happens, happens.

Let me know what you think.

ApprenticeofEnder commented 6 months ago

Actually, I think if you just blacklist the non-printable characters from the basic Latin set, the rest should work.

Of course, this does rely on a handful of things, one of which being the terminal supporting Unicode output, and the other being the ability to copy those bytes in an easy way.

You'd probably need to leverage a clipboard utility for the latter. That I'll leave to you.

Nathan2076 commented 6 months ago

While the ideas are interesting, I don't know if they're possible.

Being the Ultimate™ password generator, the idea is that every Unicode character/chart/group/emoji/whatever could be configurable, so the user could choose if it would be included in the password and its percentage rate of being chosen.

So, how would the 2D collection work? Would I need to iterate over every array inside of it? Because, if so, that wouldn't be much efficient.

Randomly generating the Unicode glyph itself is not a bad idea, but there are actually a lot of code points that don't have any glyph, even though its neighbors do (just check the Unicode/Charts.cs file, every large blank space between glyphs is a reserved code point, and so it is not included in the array). So, there is a high chance that a randomly generated code point could be an unassigned one, and checking for it may be worse than iterating through pre-made arrays.

The clipboard utility is already being used to copy the generated password, I'm currently using TextCopy for that.

Additionally, it's preferable that I don't use online capabilities, such as retrieving a pre-made Unicode CSV file or something like that. Because 1) it would be dependent on that file and its host, which can become unavailable at any moment, and 2) the idea is that, while being as secure as possible, the program is also as private as possible. So, it needs to be an offline, standalone app.

I really don't see much choice, the best I can do is to refactor it. I'm currently working on another version of that giant switch statement. It is an even bigger one, actually XD but it may be more efficient? I'll have to finish it first to see. When I do, I'll commit it and notify you.

But yeah, do you have any idea? Because, given the desired configuration options, I don't know other ways I could implement this.

ApprenticeofEnder commented 6 months ago

Well, there are a few ways of addressing those issues.

Why are we bothering with every unicode character being configurable? It's a password generator, it's not like somebody's gonna go through and select every single possible custom setting out there. Maybe if someone requests it you could add sections, but I genuinely don't see anyone doing that if they're going full balls to the wall with Unicode.
The 2D collection wouldn't need to iterate over anything. Say you have those different arrays of Unicode characters in an array. You first generate a number to select which array you're going to check, and then another number to figure out what the string is from the array you've selected. Easy. That's O(1) right there. Generation is fine if it's more computationally expensive (here I'd say O(n)) because it should be a one time thing.
Hmmmmm, good point about those reserved code points. In theory you could make a blacklist of bit prefixes that mark reserved code points, but that depends on how comfy you are with getting into the weeds with this stuff.
If you have a Unicode CSV, why not store a local copy? Check it into the repository, since it's a plaintext file, right?

Anyways, my point is I want to get away from these giant switch statements because it really feels like there should be a better way of working with this concept than a switch statement that's about 10k lines long lol.

On Friday, April 26th, 2024 at 9:50 PM, Nathan2076 @.***> wrote:

While the ideas are interesting, I don't know if they're possible.

Being the Ultimate™ password generator, the idea is that every Unicode character/chart/group/emoji/whatever could be configurable, so the user could choose if it would be included in the password and its percentage rate of being chosen.

So, how would the 2D collection work? Would I need to iterate over every array inside of it? Because, if so, that wouldn't be much efficient.

Randomly generating the Unicode glyph itself is not a bad idea, but there are actually a lot of code points that don't have any glyph, even though its neighbors do (just check the Unicode/Charts.cs file, every large blank space between glyphs is a reserved code point, and so it is not included in the array). So, there is a high chance that a randomly generated code point could be an unassigned one, and checking for it may be worse than iterating through pre-made arrays.

The clipboard utility is already being used to copy the generated password, I'm currently using TextCopy for that.

Additionally, it's preferable that I don't use online capabilities, such as retrieving a pre-made Unicode CSV file or something like that. Because 1) it would be dependent on that file and its host, which can become unavailable at any moment, and 2) the idea is that, while being as secure as possible, the program is also as private as possible. So, it needs to be an offline, standalone app.

I really don't see much choice, the best I can do is to refactor it. I'm currently working on another version of that giant switch statement. It is an even bigger one, actually XD but it may be more efficient? I'll have to finish it first to see. When I do, I'll commit it and notify you.

But yeah, do you have any idea? Because, given the desired configuration options, I don't know other ways I could implement this.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

Nathan2076 commented 6 months ago

Hah. Yeah, I think I'm being a little paranoid about that, sorry. 😅

You have a good point on storing the CSV file locally, but for that I'd need to search a bit more, since the file I found some time ago didn't seem to have all the characters.

Also, the 2D collection is a good idea, I'll work on it and commit when it's ready, it really shouldn't be that hard, just edit the Unicode.cs file and adapt the Password.cs to work with it. That would wipe out the switch statement completely, probably giving far more performance improvements, but that we'll see.

Nathan2076 commented 6 months ago

I worked on it, is 8b37b8e better? I put all charts in a 2D array and got rid of the switch statement. All it needs now is to generate a random number between 0 and 321 to choose the chart, and another one between 0 and the chart's length to choose the glyph. I don't know if it is faster, but it certainly is more readable (at least for the Password.cs file).

What do you think? If it's good, I may apply this approach to the Unicode variants and emoji files as well, to remove their big switch statements too.

(I don't know why Git thought I made changes on files that I didn't even touch, ignore that.)

ApprenticeofEnder commented 6 months ago

That looks way better.

Honestly, with C# and the frequency these operations are happening? That performance difference is going to be peanuts -- both relative to the maintainability gains and the frequency of operations performed.

We're not trying to be Google here with a crazy performant search engine, remember that.

Nathan2076 commented 6 months ago

Haha, true, but performance is never too much 😛

Alright, so I'll apply that to the Unicode variants and emoji files, post it here and probably close this issue? Let me know if I should mark it as done after doing it.

Nathan2076 commented 6 months ago

It's done! a770bab

I got a bit unsure about the names, since what was a class previously is now a variable, so I can't declare static class Unicode in two different files, for example (one for charts[][] and the other one for variants[][]). But I'll think a bit more on it later.

Any additional feedback? :)

Nathan2076 / UltimatePassGen

Refactoring the Generate() function #1