Neos-Metaverse / NeosPublic

A public issue/wiki only repository for the NeosVR project
195 stars 9 forks source link

SHA256 and SHA512 as string->string logix nodes #2773

Open Earthmark opened 3 years ago

Earthmark commented 3 years ago

Is your feature request related to a problem? Please describe.

Preserving user anonymity in web domains is tricky, but much easier to do if I can create a user identity token client side, and only pass that psuedoanonymous token to webservers as a resettable ID (and store the token in a protected cloud variable).

Towards this end I would like the ability to create an ID from local user data, and hash the crap out of, turning random details of a user and other local data fragments into a strongly salted (and forgettable/resettable) psuedo-id.

Also, security primitives are great, please add'em! (This would also allow Engi's noise gun to be even more chaotic).

Relevant issues

No response

Describe the solution you'd like

Add a Cryptography section to Strings in logix, and have SHA256 and SHA512 nodes that take a string and produce a string, possibly with an impulse input and output if this needs to be a slower operation.

Describe alternatives you've considered

Rubix I think has SHA256 implemented in logix - I do not want to verify the integrity or performance of his implementation, please do not make me!

With this specific need I could do these approaches instead:

Querying the webserver anonymously, and having it mint the token: For my particular use case I would rather clients be able to verify the logic, and this is to be included in an image url, so it needs to preserve privacy in a context where the user isn't directly warned about the privacy risk.

Additional context

People might request MD5, but to my knowledge it's considered insecure enough I'm not requesting it. CRC is in the same boat, but as that one is specifically bit-by-bit I feel it's a bit too special purpose to byte vectors for this request.

Earthmark commented 3 years ago

My bad, #785 is related.

mralext20 commented 3 years ago

i would like to make the case that having MD5 would be a good thing, even if it's not a good hashing method.

in the same space, a base64 encode and decode would be useful aswell.

ProbablePrime commented 3 years ago

This is a possible duplicate of : https://github.com/Neos-Metaverse/NeosPublic/issues/1305 and unfourtunately requires collections.

Earthmark commented 3 years ago

Then can we have the GetHashCode node? as that works on strings and ints and such.

ProbablePrime commented 3 years ago

What would you use that for? hascode isnt for much. historical its caused problems in the c# community too.

Earthmark commented 3 years ago

It is justifiable as being able to get a psudo-random representation of a string value without needing to use a logix implemented hash function.

Earthmark commented 3 years ago

I guess I'm using Rubix's implementation, now to verify a logix hash functio!

ProbablePrime commented 3 years ago

That's not ideal.

If two string objects are equal, the GetHashCode method returns identical values. However, there is not a unique hash code value for each unique string value. Different strings can return the same hash code.

The hash code itself is not guaranteed to be stable. Hash codes for identical strings can differ across .NET implementations, across .NET versions, and across .NET platforms (such as 32-bit and 64-bit) for a single version of .NET. In some cases, they can even differ by application domain. This implies that two subsequent runs of the same program may return different hash codes.

As a result, hash codes should never be used outside of the application domain in which they were created, they should never be used as key fields in a collection, and they should never be persisted.

Finally, don't use the hash code instead of a value returned by a cryptographic hashing function if you need a cryptographically strong hash. For cryptographic hashes, use a class derived from the System.Security.Cryptography.HashAlgorithm or System.Security.Cryptography.KeyedHashAlgorithm class.

For more information about hash codes, see Object.GetHashCode.

https://docs.microsoft.com/en-us/dotnet/api/system.string.gethashcode?view=net-5.0

For logix, there really isn't a use for the hash codes generated by these functions. They're mostly used for collection related lookups etc.

Earthmark commented 3 years ago

I'm aware, from my perspective I'm not getting much to work with so I'm trying to figure out what I can use. Ya'll just shot down the best case, so now I'm working through fallbacks.

ProbablePrime commented 3 years ago

Its not shut down, its just a case of suitability. These methods(hash code) aren't suitable for hashing in a similar way to SHA or MD5 or Base64.

If we added methods(hash codes) for these they'd be used in invalid ways.

If however we wait for collections, it'll be easier to add these nodes from the inital request.

Earthmark commented 3 years ago

I guess as an intermediate proposal, can I request this be a string -> UTF8 -> SHA* node? Then once we get collections that gets auto expanded using the update mechanism to be two nodes instead of one node? That way for now we have the hash function, without introducing long term technical debt?

ProbablePrime commented 3 years ago

That still unfortunately won't solve the problem as quoted from Froox in the other issue:

This requires collection support for LogiX first. Hashing works on raw bytes, not on strings and similarly it produces raw bytes, rather than a string. To hash a string you first generate raw bytes using a particular encoding and feed it into the hash function.

BlueCyro commented 3 years ago

Its not shut down, its just a case of suitability. These methods(hash code) aren't suitable for hashing in a similar way to SHA or MD5 or Base64.

If we added methods(hash codes) for these they'd be used in invalid ways.

If however we wait for collections, it'll be easier to add these nodes from the inital request.

Why do we need collections? What's stopping a simple string -> sha converter?

ProbablePrime commented 3 years ago

See: https://github.com/Neos-Metaverse/NeosPublic/issues/1305 for commentary from Frooxius.

Enverex commented 3 years ago

That doesn't answer the question though. Hashing, as far as the user is concerned, is String in -> String out. No lists, arrays, etc involved on our side. We don't need collecctions/lists for that, it's a simple string function our side, regardless what's going on behind the scenes. Froox can already use whatever C# has at it's disposal his side, regardless of what is exposed our side, surely?

But yes, I've also needed hashing for some time to make URL calls to various things much cleaner and URI safe.

Frooxius commented 3 years ago

It's separation of concerns and preventing explosion of nodes and their variants and ensuring the design of the system stays as clean as it can and choosing where we invest our limited development time to get most out of this.

When you're hashing strings, there's a number of ways you could actually do that, which all result in different hashes - things like encoding, padding, salting (there are different methods of doing this too) will change the resulting hash and all might be needed for different use-cases.

Similarly the resulting hash might need to be formatted or used in various ways. Sometimes you need the raw binary data for raw transmission, sometimes you encode it to hex or to base64, sometimes even to base32 and so on.

Now we have a few choices here: 1) Add a giant mega-node that incorporates all of these functions, duplicating code and functionality that could also be used in other places, being hard and confusing to use and hard to maintain 2) Add dozens, potentially hundreds of variants of nodes for the different combinations. Hard to use and maintain again 3) Add only this specific usecase. This almost inevitably results to people discovering it doesn't quite meet their needs in all cases, which tends to lead to more requests, which push us back to 1) or 2) (and also costs a whole more time on our end) 4) Wait until we implement a proper system to dealing with this and then those tasks become trivial - we implement each of those as a standalone general function, that's usable not only for this usecase, but for hundreds, if not thousands of others that would otherwise be unmanageable to implement individually.

TL;DR: Developer time is limited, we're looking for approach that's biggest bang for the buck (and also doesn't leave us with a bunch of technical debt that slows down other future features).

Enverex commented 3 years ago

In all cases where I've dealt with hashing, the function itself would typically just take the value you wanted hashed and the hashing algorithm to use. Padding, salting, encoding would all be handled by separate functions (or nodes in Neos' case) before or after the hashing process. e.g. in the case of Neos, you'd expect that salting (and padding) would be handled with a formatting node before the hashing node, if it's not integrated, etc.

Not intending to step on toes here, just thoughts when comparing it to existing systems.

ProbablePrime commented 3 years ago

Padding, salting, encoding would all be handled by separate functions (or nodes in Neos' case) before or after the hashing process

That's not exactly how hashing works or how Neos works.

In terms of C#, it gets complicated really fast and handling that in a way that's modular, scalable, secure etc. Add to that we need a way to do this in LogiX that is Idiomatic and easy to use. You can find some examples of complexity here(https://medium.com/@mehanix/lets-talk-security-salted-password-hashing-in-c-5460be5c3aae) but this is just at the top of the google results.

Even with that, I would never recommend using it for anything secure within the confines of Neos. It would be much better to provide specific scenarios in other nodes or structures to help you out. A good(but unrelated to this conversation example being OAuth rather than users sharing their password with 3rd party sites).

In some cases I feel like there are alternatives here that can be looked at when we go back to the scenarios you would like so going back to Earthmarks':

Preserving user anonymity in web domains is tricky, but much easier to do if I can create a user identity token client side, and only pass that psuedoanonymous token to webservers as a resettable ID (and store the token in a protected cloud variable).

I don't see the problem with transmitting User Ids we consider these public information. When transmitted over HTTPs there is a reasonable level of security protecting the payloads. Could you explain more why using them is a problem?

Are you perhaps, talking about proving a User IS who they say they are? There are some other solutions here that might not require the exact ask here (Raw Hashing Nodes). We should discuss them elsewhere, perhaps on a more focused issue.

Also, security primitives are great, please add'em! (This would also allow Engi's noise gun to be even more chaotic).

I'm not sure I understand, there should be a reasonable degree of randomness inside a user's username to allow for quite a lot of effects. If you need more randomness for this kind of effect consider using the Machine Id instead. There is also quite a lot of random nodes that can give you enough Pseudo random data to use. All of the random nodes use: https://docs.microsoft.com/en-us/dotnet/api/system.random?view=net-5.0 internally.

For Enverex's URI Problem. Uri's can be encoded using Escape Uri Data String to avoid URI problems. Could you elaborate on any additional URI problems you might have?

On Alex: Yeah Base64 I can see, Unfortunately that still has some issues.