SteamRE / SteamKit

SteamKit2 is a .NET library designed to interoperate with Valve's Steam network. It aims to provide a simple, yet extensible, interface to perform various actions on the network.
GNU Lesser General Public License v2.1
2.59k stars 493 forks source link

WebAPI does not properly encode special characters #641

Closed JustArchi closed 5 years ago

JustArchi commented 5 years ago

While digging in my own toys, I've found out that current WebAPI code does not properly encode special characters. This is well documented in SK2 code, except it's not universally correct approach. It's totally valid to use characters that can be problematic for the URL, namely & as an example. If I use a key-value pair such as name -> bob& then raw data should be encoded as name=bob%26, while right now it's encoded as name=bob&. This isn't just incorrect as it is, it can create issues with other arguments.

Moreover, the rabbit hole is much deeper since changing it to proper encoding would actually regress in valid code. I was testing ISteamUserAuth/AuthenticateUser API endpoint today and wondering why FormUrlEncodedContent() doesn't work nicely with my args, turns out it's because Steam actually expects binary there, and you can't encode those bytes for special characters, as the data will be no longer correct (and this is why my FormUrlEncodedContent() didn't work).

Considering the facts above this can be a problematic issue to fix, since you can't know in advance whether something should be encoded or not. The easiest solution that would not regress and fix the underlying issue would be a check for type of each WebAPI argument, and apply new encoding logic for strings, while keeping current no-encoding logic for byte[]. The only problem with this is that it's so abstract to do things this way that it might be better to rewrite entire http content into something like multipart/form-data and handle strings and binary in different way - I just didn't test how nicely Steam servers play with it, and it'd still require extra logic like the one above what exactly type the user is supplying, so byte[] becomes allowed and encoded as binary.

Right now current behaviour of SK2 breaks RFC 1866:

        1. The form field names and values are escaped: space
        characters are replaced by `+', and then reserved characters
        are escaped as per [URL]; that is, non-alphanumeric
        characters are replaced by `%HH', a percent sign and two
        hexadecimal digits representing the ASCII code of the
        character. Line breaks, as in multi-line text field values,
        are represented as CR LF pairs, i.e. `%0D%0A'.

And the most hilarious thing is how making it right would regress in my own working code, and make it impossible to use for binary data.

Let me know what you think, I'm super confused myself.

yaakov-h commented 5 years ago

I'll have to check, but I think this encoding mechanism came straight from steamclient.dll/dylib/so.

Interestingly, at least some of the Service APIs / Unified Messages accept binary input in Base64 and Hexadecimal, IIRC. I wonder if AuthenticateUser would...

JustArchi commented 5 years ago

Right now I'm using this code that works:

await iSteamUserAuth.AuthenticateUser(
    encrypted_loginkey: Encoding.ASCII.GetString(WebUtility.UrlEncodeToBytes(encryptedLoginKey, 0, encryptedLoginKey.Length)),
    method: WebRequestMethods.Http.Post,
    sessionkey: Encoding.ASCII.GetString(WebUtility.UrlEncodeToBytes(encryptedSessionKey, 0, encryptedSessionKey.Length)),
    steamid: steamID
)

I've tried using standard FormUrlEncodedContent() with BitConverter.ToString().Replace("-", "") and Convert.ToBase64String() but no luck. I've also tried to do the same with SK2 WebAPI, and it also didn't work. At least BitConverter should because it doesn't generate non-alphanumeric characters, so the encoded POST data is the same with both.

I've also tried bunch of other scenarios, such as Encoding.ASCII.GetString() with FormUrlEncodedContent(), and all of that failed, the only solution that works for me right now is encoding raw bytes, then generating ASCII string of it, and then generating HTTP content without encoding of value, so exactly what my code does above, using SK2's WebAPI.

If you by any chance want to try yourself, you can check my working code in ASF, I couldn't get this function to work through any other combination, but I did succeed doing the same RFC violation that WebAPI does, generating HttpContent myself without encoding the values, and it did work as well, but once I enforce the RFC with FormUrlEncodedContent() then I can't make it work through any means, at least with application/x-www-form-urlencoded that is. I'm always getting 403 from Steam.

xPaw commented 5 years ago

Try using HttpUtility.UrlEncode. I just tested it on AuthenticateUser and it seems to work.

JustArchi commented 5 years ago

@xPaw It doesn't work for me using FormUrlEncodedContent():

Dictionary<string, string> data = new Dictionary<string, string> {
    { "encrypted_loginkey", HttpUtility.UrlEncode(encryptedLoginKey) },
    { "sessionkey", HttpUtility.UrlEncode(encryptedSessionKey) },
    { "steamid", steamID.ToString() }
};

Did you try to FormUrlEncodedContent() of the above? The problem I'm refering to is Steam not accepting any input that gets encoded into the HTTP content, it throws 403 unless you manually generate HTTP content without encoding of values.

xPaw commented 5 years ago

Why are you trying to FormUrlEncodedContent it?

JustArchi commented 5 years ago

Because you need to encode keys and values as per RFC when using application/x-www-form-urlencoded, this is what FormUrlEncodedContent() does, and this is what current implementation of SK2 WebAPI doesn't. I can always generate a request that works on my end, the issue is about the fact that WebAPI doesn't encode any value, which means that if you use something like & in a value of some other request that expects a string, you'll blow up whole payload, as per my name=bob& example above.

JustArchi commented 5 years ago

This is why I'm trying to make a solution that works with FormUrlEncodedContent(), as then we can correct WebAPI to use the same solution and not regress at the same time.

xPaw commented 5 years ago

Show what modifications you have done in SK, otherwise it's hard to tell what you're trying to do.

JustArchi commented 5 years ago

I didn't do any modifications to SK2, maybe let me summarize so I can explain better what I mean:

WebAPI gets arguments from the consumer, in form of key-value pairs of varying types, but they're all converted into ToString() implementation in the end.

When sending arguments, either in query string (like in GET requests) or through application/x-www-form-urlencoded (like in POST requests), you need to encode the keys and values for special characters, such as & or %. This is defined in RFCs and non-negotiable, at least if you want to follow HTTP spec.

SK2's WebAPI currently violates this RFC, as it does not encode the values, but it does encode the keys. This means that you can't use special characters in values of SK2 WebAPI requests, as they have a potential to blow up the rest of the arguments, for example name=bo&b + a=b which should be encoded as name=bo%26b&a=b (valid http content), is currently encoded by SK2 WebAPI as name=bo&b&a=b, which breaks the payload.

The proper solution to this problem is just encoding the value as it should be done (and as it's done in FormUrlEncodedContent() which I use as a reference implementation), which shouldn't regress with current usage of strings, as previously special characters shouldn't work at all, and now they will. The problem however is regression in entirely different aspect, endpoints where Steam servers expect rawbinary, as for some unknown to me reason Steam doesn't decode it on their side, which means that trying to encode any form of rawbinary on our end will result in invalid data on Steam side, and since not encoding it is a violation of RFC, you don't even have any .NET built-in method to encode only keys and not values, since it doesn't make any sense. This is why (I assume) SK2 devs decided to do it this way, and left a TODO to somehow add handling for special characters in the future. This is what I'm refering to in this issue, and trying to fix it so WebAPI follows the RFC while not regressing over the fact that it'll no longer be possible to send rawbinary to Steam through WebAPI.

Finding a way to send rawbinary through url-encoded keys and values is the solution to fixing WebAPI for special characters while still not breaking rawbinary values. This is what I'm trying to do, I don't have any issues with my code, it's only used as an example of what works and what does not.

Let me know if I should elaborate on something else or if something is still unclear.

JustArchi commented 5 years ago

As an example of flaw of current implementation, find yourself Steam API endpoint that accepts some string value, put some special combination there such as bob%26 and see how Steam interprets it. If you used it for endpoint that changes a nickname for example, Steam would change your nickname into bob& and not bob%26 that you used as a value in your code. This is because WebAPI did not encode % so it sent bob%26 in a content, and %26 was interpreted as encoded &, as opposed to being sent as content bob%2526 and interpreted as bob%26, which is what you wanted.

Just remember to send the request through SK2's WebAPI to trigger the bug.

xPaw commented 5 years ago

"I didn't do any modifications to SK2"

Were you testing the changes using a HttpClient constructed yourself or passed it through SK's WebAPI?

JustArchi commented 5 years ago

Both. Yesterday I was testing with static default SK2 HttpClient, today with my own. I can do a second round with static if you want to, but I don't see how HttpClient can be of problem here unless you manually fiddle with HttpClientHandler routines, and I didn't do that. Enabling decompression support or specifying a different user-agent does not affect the arguments being sent, and you can reproduce the flaw I mentioned above with either, since it's not HttpClient or handler in charge of setting up http content. I'll repeat all the steps above with default HttpClient just to be sure.

xPaw commented 5 years ago

I'm asking because you can't pass raw body data to SK's api calls, can you?

So if you wanted to test FormUrlEncodedContent, you would either need to modify SK implementation or test it on a separate HttpClient without SK at all.

JustArchi commented 5 years ago

So if you wanted to test FormUrlEncodedContent, you would either need to modify SK implementation or test it on a separate HttpClient without SK at all.

Yes, this is what I did, I tested my own HttpClient implementation that uses FormUrlEncodedContent() as a reference implementation, so I can see what works and what does not without fiddling with SK2 code. However, I verified the root cause by manually rewriting my implementation to use the same custom encoding that SK2 WebAPI uses, and I confirmed that this fixed the issue. This was crucial for me before I opened the issue, as I do not jump onto conclusions without double-checking and at least being fairly sure that I'm correct.

SK2 WebAPI doesn't do anything else than parsing your arguments, constructing a dictionary of args and manually constructing http content, there is nothing else that could change the outcome of the request, of course you could come up with abstract thing such as different user-agent, but it's not that, otherwise my implementation wouldn't work either.

xPaw commented 5 years ago

If you look at FormUrlEncodedContent code, it uses Uri.EscapeDataString internally. Said function only expects a string so you can't really pass a binary byte array into it.

FormUrlEncodedContent on its own hardly does anything interesting, it's basically the same thing as SteamKit's implementation, it just encodes the keys/values differently.

If you use HttpUtility.UrlEncode AND FormUrlEncodedContent, you're effectively encoding it twice, so it will obviously fail.

That function may return percent encodings if international support is enabled, but that's just a clusterfuck. https://docs.microsoft.com/en-us/dotnet/api/system.uri?view=netframework-4.7.2#international-resource-identifier-support

https://github.com/dotnet/corefx/blob/master/src/System.Net.Http/src/System/Net/Http/FormUrlEncodedContent.cs

JustArchi commented 5 years ago

it's basically the same thing as SteamKit's implementation, it just encodes the keys/values differently.

No, it encodes value while SK2 does not.

Compare:

https://github.com/SteamRE/SteamKit/blob/e2d6f88b2d2840cabb92bc5fe3d72b40432caa41/SteamKit2/SteamKit2/Steam/WebAPI/WebAPI.cs#L275

And:

https://github.com/dotnet/corefx/blob/master/src/System.Net.Http/src/System/Net/Http/FormUrlEncodedContent.cs#L41

This is exact reason for difference between those two implementations. Yes, there are other misc things like %20 encoded as + but they're not the exact reason for failure in this case.

If you use HttpUtility.UrlEncode AND FormUrlEncodedContent, you're effectively encoding it twice, so it will obviously fail.

And I know that, this is why I tested whole lot of combinations, started from reference that works (encoded bytes into string + lack of encoding in SK2 WebAPI), and went through everything that came to my mind (bytes to hex + encoding, bytes to base64 + encoding, non-encoded ASCII string + encoding). I documented it all in my posts above, and pointed out that as long as there is RFC encoding applied, I can't make it work with any of my inputs.

xPaw commented 5 years ago

So with all of your testing, do you have a solution?

If different Valve's APIs expect different encodings, you can't do one-solution-fits-all implementation.

JustArchi commented 5 years ago

I have only one solution that would work and satisfy every use case, and it's not pretty.

It'd be switch on type of kvp.Value and:

This way WebAPI can effectively detect rawbinary (which should be always in form of byte[]) and apply the proper logic, while still fixing lack-of-encoding flaws mentioned above for strings and all other types that use special characters.

This is a breaking change though for existing rawbinary code. Not significant considering not a lot of endpoints accept it in the first place, but regardless, current usage of rawbinary like my valid example above will stop working and I'll need to drop Encoding.ASCII.GetString() from it.

JustArchi commented 5 years ago

This is also why I'm opening the issue and not submitting PR, because perhaps there is a better approach to this issue, for example finding out that we could send that rawbinary as some kind of encoded string and then encode everything (as per RFC) while showing examples to user how rawbinary data should be done, even add helpers if appropriate.

JustArchi commented 5 years ago

The good thing is how this can be done without breaking the API, since SK2 can already detect types of arguments from supplied code, and therefore decide to make appropriate operation without user needing to specify what should be encoded and what should not.

The only drawback is that breaking change of existing rawbinary setups that will need to drop final conversion of them to string and supply byte[] instead.

Still, maybe there is some better approach that I'm not aware of.

yaakov-h commented 5 years ago

If what @xPaw said above is correct, it's likely that WebAPI actually is RFC-compliant.

Looking at it again after having a good night's sleep, I think the problem is that you're double-encoding the data.

HttpUtility.UrlEncode encodes the data into a percent-escaped string, and then FormUrlEncodedContent will encoded those percent signs into even more percentage signs.

e.g. \ => %5C => %255C

Try building your own HttpContent subclass that encodes incoming strings once and incoming binary once, and it should work.

JustArchi commented 5 years ago

Looking at it again after having a good night's sleep, I think the problem is that you're double-encoding the data.

That's not the issue. The order of things which works right now for me is url-encoding the bytes of raw binary (using WebUtility.UrlEncodeToBytes()), then constructing ASCII string out of them, so pretty much the code I submitted as an example above. Since SK2's WebAPI doesn't do anything else, this works.

However, if you want to make encoding your last step (like what you suggest doing with custom HttpContent and what FormUrlEncodedContent() is doing) then ASCII string out of bytes that is then encoded for transmission will not work, I've already tested that in my round of tests specified above. This is because the output is not the same:

image

I'm not sure why I need to repeat the same stuff over and over again, you can just check for yourself if you think that it's my code doing something wrong, make AuthenticateUser with FormUrlEncodedContent() work (or any other solution that complies with RFC), show me the code and I'll be glad to use the same approach 🤔. I didn't find a way to make AuthenticateUser API endpoint work while encoding the value as a last step. Saying that it works right now is the same as me saying that you can't use & character in a value and that it needs correction, except when you correct it then I'll say that rawbinary no longer works 🤔.

yaakov-h commented 5 years ago

I never said to make encoding the last step, I only said to do it once.

UrlEncodeToBytes puts the percent signs inside the byte array. If you decode to ASCII first, the runtime replaces unprintable characters with a question mark, which won’t work.

If you get your perfect percent-encoded string, which it sounds like you’re already doing, then passing it to FormUrlEncodedContent will doubly encode it and destroy your value.

SteamKit probably could encode binary values as per RFC and it would all still work, but it won’t solve your problem of trying to use FormUrlEncodedContent.

FormUrlEncodedContent won’t work for you because it’s designed to take unencoded strings as it’s input, not encoded ones. Unless there’s an option to disable encoding, you’ll need to build your own content/formatter.

JustArchi commented 5 years ago

SteamKit probably could encode binary values as per RFC and it would all still work, but it won’t solve your problem of trying to use FormUrlEncodedContent.

I don't need to use FormUrlEncodedContent(), I use it as a reference implementation that is RFC-compliant. If you claim that it's not possible to use that one for rawbinary case then I say that it's fine, we should then use the one that encodes the values and still works, because right now there is nothing.

FormUrlEncodedContent won’t work for you because it’s designed to take unencoded strings as it’s input, not encoded ones.

How many times do I need to repeat that I don't double-encode my data? How many times do I need to repeat that binary->encode->string is not equal to binary->string->encode, and how many times do I need to repeat that I'm trying to solve WebAPI limitations in regards to special characters and rawbinary? If I wanted to write my own HttpContent in my own implementation then I wouldn't bother you and try to fix the missing functionality in SK2 code, I'd just rewrite it myself.

Maybe I should just stop trying to solve the issue and state what is needed for WebAPI to be functional:

Make both work through WebAPI and we can close the issue 🤔. What you use under the hood doesn't matter as long as it's RFC-compliant, and it will be when you make first case work properly. If you don't want to listen to my findings regarding trying to fix the WebAPI so it works for the above, you can always make your own, those that are definitely not flawed by "double-encoding the data". My original issue of WebAPI not encoding special characters in values stands.

yaakov-h commented 5 years ago

How many times do I need to repeat that I don't double-encode my data?

The example you posted was this:

new Dictionary<string, string> {
    { "encrypted_loginkey", HttpUtility.UrlEncode(encryptedLoginKey) },
    { "sessionkey", HttpUtility.UrlEncode(encryptedSessionKey) },
    { "steamid", steamID.ToString() }
};

This created a dictionary with correctly URL-encoded values. However, if you then pass that dictionary to FormUrlEncodedContent, as you stated, then each of those values will be URL-encoded again over here. That is why it is failing for you, and not:

because Steam actually expects binary there, and you can't encode those bytes for special characters, as the data will be no longer correct

as per your original post.

Also,

then I wouldn't bother you and try to fix the missing functionality in SK2 code, I'd just rewrite it myself.

I thought that was what you were trying to do in the first place? 😕


In any event, I believe all we need to do to fix this is:

JustArchi commented 5 years ago

This created a dictionary with correctly URL-encoded values.

Guess why I tried that? 🤔

once I enforce the RFC with FormUrlEncodedContent() then I can't make it work through any means, at least with application/x-www-form-urlencoded that is. I'm always getting 403 from Steam. (...) Try using HttpUtility.UrlEncode. I just tested it on AuthenticateUser and it seems to work.

Well, it didn't work, I didn't expect that to work either, but I tried in hope that I'm wrong regardless. And then you say that it's me who is double-encoding the things 🙁.

I thought that was what you were trying to do in the first place? 😕

I was trying to discuss potential solutions to this issue since it's not a trivial one-line fix that wouldn't regress in other place, rawbinary is an example of such place, nothing else. I then elaborated on why simply encoding the value won't fix the case, and what are the alternatives to make it work.

In any event, I believe all we need to do to fix this (...)

Your solution looks exactly the same as mine with detecting byte[] and just stringifying it, while encoding everything else. That's alright if you don't think that there is anything better, I posted this issue with intention that perhaps somebody comes up with something nicer. I'm alright with this solution myself, thank you 🙂.

voided commented 5 years ago

Throwing my own comedy opinion into the ring: we deprecate the WebAPI helpers in vNext, remove them the next major a couple months down the line. This stuff is just HTTP that anybody can solve with their own HttpClient or HttpWebRequest.

Valve's original Web APIs aren't anywhere near RFC compliant - and I wouldn't be really surprised if our approach to ensuring we're encoding arguments correctly will break some other API endpoint we're not thinking about.

JustArchi commented 5 years ago

TBH this is what I initially planned to do in my own ASF#1086, but upon closer investigation of edge-cases like these, I'm not so sure if dropping it is beneficial for consumers. I think you need to discuss whether you want to support WebAPI and its quirks, if yes, it's definitely a good idea to keep it around as it's still better than me fiddling with my own HttpContent for rawbinary just because Steam acts like that. However, if you decide that you don't have enough willings, maintenance capabilities or patience to make the WebAPI work for all of those edge use-cases, it's fine if you decide to deprecate and close it.

Personally I'd like to see it being supported, as it does simplify a lot of things and with recent shared HttpClient implementation which was done by @yaakov-h I don't think that there is that much to fix or maintain regarding that, it's working really nice and besides this issue I didn't find anything else that needs drastic improvements, but then again it's not me supporting it so I shouldn't state what I think is right without being involved in the project.

Regardless, thank you a lot for your continuous support, I really appreciate it, even if sometimes I lack patience to show it 😀.

yaakov-h commented 5 years ago

We do still need WebAPI internally, so we have to keep it alive in some form or another.

azuisleet commented 5 years ago

To clarify this, I don't believe there's actually anything wrong here. This isn't "does not properly encode" but rather, "does not automatically encode/escape user input".

I don't believe there are any RFC conflicts at this point. It may be that the reserved set for WebHelpers.UrlEncode was only applicable for CDN, in which case we could prefer HttpUtility.UrlEncode.

Whether the value is a byte[] or a string, HttpUtility.UrlEncode has the override for both, and takes the correct action.

I would vote for lifting out the encoding logic into TryInvokeMember, where we would call UrlEncode on all the values as they're placed into the dictionary (but what about GET encoding requirements vs POST encoding requirements?)

JustArchi commented 5 years ago

Expecting from the consumer using WebAPI to manually escape his data when he doesn't even have any say over HttpContent is wrong to begin with, and the fact that you do encode the keys but not values shouldn't be considered anything but a flaw.

I didn't see any documentation that mentioned how user is expected to do that amount of low-level stuff, if I did then I'd complain much earlier about that as a design flaw.

If you truly want user to encode his strings, then you should accept HttpContent from him or explicitly mention that he's supposed to (even though in this case I'd just drop using WebAPI altogether and send the request myself, since it no longer does anything useful for me if I have to manually cover low-level stuff anyway).

The proper place to do the encoding is the one creating HttpContent, the only difference is that the same place should be aware of different types and encode them appropriately. This is how it's handled in all the .net functions responsible for that.

azuisleet commented 5 years ago

WebAPI is definitely a wrapper that provides value. I don't agree with allowing the user to supply their own entity body, as WebAPI is meant to wrap a specific flow (form data request body or query string with VDF response). GET requests don't involve an entity body.

I agree that having to manually escape parameters is not convenient, which is why I agree with making a breaking change to encode string and byte[] parameters in WebAPI.

Your code for encoding the parameters is correct, the only difference would be whether it's done in user code or in WebAPI. If nothing else, the documentation may need to state the user needs to encode their input using HttpUtility.UrlEncode.

JustArchi commented 5 years ago

I'd rather opt for WebAPI applying the correct logic and not expecting from consumer to do any encoding at all. The fact that I do the encoding myself right now is the side-effect of the fact that WebAPI doesn't. It's not like encoding is really consumer-specific, RFC defines it explicitly, the only modification that is needed apart from encoding everything, is special case for handling byte[] so rawbinary can be sent using WebAPI without resorting to hacks like not encoding anything (current behaviour) or sending the request manually because WebAPI encodes everything and it's no longer possible to send what Steam wants.

This is also why I consider existing code to be fairly decent, it just needs small logic enhancement of encoding everything but byte[], which as a special case is getting Encoding.ASCII.GetString(), as opposed to everything else that gets HttpUtility.UrlEncode(value.ToString()). I really do not see any other edge case being available anytime soon to justify customer doing the encoding on his own.

azuisleet commented 5 years ago

byte[] types aren't really a special case, they just have be encoded directly from binary to URL encoding (and not through any intermediate encoding such as ASCII which is only 7 bits). There is no case where you do not encode binary data, all parameters need to be encoded.

We are in agreement that WebAPI should allow byte[] parameters, and that it should use the correct overload of HttpUtility.UrlEncode that takes binary data and returns the URL encoded string.

JustArchi commented 5 years ago

I know they aren't per-se, but they are in our case where we need to handle them in a special way to make the expected use case to work. It's basically a workaround for the consumer to be able to send rawbinary in expected by Steam format, without manually crafting a request or depending on non-documented internal implementation of SK2 like in current case, where consumer is expected to encode value but not the key, and it goes against any sensible expectations.

In any case I'm glad we have an agreement upon the solution of this problem 😀.

azuisleet commented 5 years ago

This should make it into the 2.2.0 release as a breaking change. The correct encoding will be handled by the library, with the user passing in un-encoded parameters.