dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.63k stars 4.57k forks source link

Inheritance security rules violated by type: 'System.Net.Http.WebRequestHandler'. Derived types must either match the security accessibility of the base type or be less accessible. #18280

Closed clairernovotny closed 4 years ago

clairernovotny commented 7 years ago

Using the latest System.Net.Http 4.1.1 as per https://github.com/dotnet/runtime/issues/17770#issuecomment-242131706, results in an exception when starting a web app that's .NET 4.6.1:

Inheritance security rules violated by type: 'System.Net.Http.WebRequestHandler'. Derived types must either match the security accessibility of the base type or be less accessible.

I've emailed a repro to @davidsh


Execution plan & status

[UPDATED by karelz]

High-level plan: A. Revert HttpClientHandler implementation in net46 build of CoreFX back to using original .NET Framework HTTP stack instead of WinHTTP (WinHttpHandler) based stack. B. Revise implementation of the 8 new APIs on HttpClientHandler we introduced in 4.1.0.0 OOB package so that it works accordingly for the net46 build.

Execution plan:

  1. Validate feasibility of [A]

    • [x] a. Port HttpClientHandler from NetFX (remove WinHttpHandler build dependency for net46 build).
    • [x] b. Add APTCA (on assembly only, no security attributes should be necessary for types or methods - same as in Desktop source code).
      • [x] Run the SecAnnotate tool to verify the claim above - Result: Passed
    • [x] c. Manually test the 2 scenarios (simplified and @onovotny’s) - Result: Verified
  2. Validate feasibility of [B]

    • [x] a. Investigate the 2 remaining APIs (DefaultProxyCredentials, MaxConnectionsPerServer) which we do not know if we can implement. - Result: They are in the bucket 4.a below.
  3. Full testing of [A] implementation (cost: 1d)

    • [x] a. Make changes in master
    • [x] b. Test all ~7 end-to-end scenarios reported by community (ask for help from community, provide steps to consume master packages on myget)
      • [x] Self hosting ASP.NET Core from Windows Service - validated by @annemartijn0 (here)
      • [x] Azure Storage API - validated by @karelkrivanek (here)
      • [x] Raven.Database package + usage of new HttpClientHandler - validated by @jahmai (here)
      • [x] Direct dependency on System.Net.Http - validated by @pollax (here)
      • [x] 4.6 console app depending on System.Net.Http - validated by @MikeGoldsmith (here)
      • [x] 4.6 Azure webjob (console app) with ServiceBus - validated by @chadwackerman (here)
      • [x] 4.6 Azure Batch app - validated by @chadwackerman (here)
      • [ ] Not-yet-described scenario by @dluxfordhpf
  4. Full implementation and testing of [B]

    • [x] a. Decide on design of 4 APIs (CheckCertificateRevocationList, SslProtocols, DefaultProxyCredentials, MaxConnectionsPerServer) which we can’t implement “correctly” - we have 3 choices - either throw PlatformNotSupportedException, or do nothing, or set the properties domain-wide instead of per-HttpClient-instance
      • [x] i. Implement the decision (cost: 2d)
      • [x] ii. List all libraries on NuGet (e.g. WCF?) which use the APIs we will be technically breaking, contact them
      • 4 NuGet libraries potentially affected - owners contacted - see details and tracking progress
    • [x] b. Implement 5 APIs which we know how to implement (cost: 3d)
    • [x] c. Final testing on master branch package - covered by [3.b]
  5. Ship final packages

    • [x] a. Port changes into release/1.1.0 branch
    • [x] b. Produce new package - 4.3.1
    • [ ] c. Test most of ~7 end-to-end scenarios reported by community (ask for help from community, provide steps to consume 4.3.1 stable package from myget feed - see here)
      • [ ] Self hosting ASP.NET Core from Windows Service - @annemartijn0
      • [x] Azure Storage API - @karelkrivanek
      • [ ] Raven.Database package + usage of new HttpClientHandler - @jahmai
      • [x] Direct dependency on System.Net.Http - @pollax (here)
      • [ ] 4.6 console app depending on System.Net.Http - @MikeGoldsmith
      • [ ] 4.6 Azure webjob (console app) with ServiceBus
      • [ ] 4.6 Azure Batch app
      • [ ] Not-yet-described scenario by @dluxfordhpf
      • [x] KeyVault - @Vhab (here)
      • [x] SignalR - @tofutim (here)
      • [x] OwlMQ - @JoeGordonMVP (here)
    • [x] d. Publish package on nuget.org - https://www.nuget.org/packages/System.Net.Http/4.3.1

Impact of the change - Breaking changes

Here's list of technical breaking changes caused by the proposed solution. It includes workarounds for each. Note that these new behaviors are specific when running on net46 / Desktop. When you run on .NET Core, the behavior is intact.

  1. HttpClientHandler.CheckCertificateRevocationList (introduced in System.Net.Http 4.1)
    • New behavior: Throws PlatformNotSupportedException
    • Workaround: Use ServicePointManager.CheckCertificateRevocationList instead (impacts the whole AppDomain, not just single HttpClientHandler as it did in System.Net.Http 4.1-4.3)
  2. HttpClientHandler.SslProtocols (introduced in System.Net.Http 4.1)
    • New behavior: Throws PlatformNotSupportedException
    • Workaround: Use ServicePointManager.SecurityProtocol instead (impacts the whole AppDomain, not just single HttpClientHandler as it did in System.Net.Http 4.1-4.3)
  3. HttpClientHandler.ServerCertificateCustomValidationCallback (introduced in System.Net.Http 4.1)
    • New behavior: Works fine, except that the first parameter of type HttpRequestMessage is always null
    • Workaround: Use ServicePointManager.ServerCertificateValidationCallback
  4. HTTP/2.0 support (introduced in System.Net.Http 4.1)

    • New behavior: System.Net.Http (for net46 = Desktop) no longer supports HTTP/2.0 protocol on Windows 10.
    • Workaround: Target System.Net.Http.WinHttpHandler NuGet package instead.
    • Details:

      • HTTP/2.0 support is part of the new CoreFx HTTP stack which on Windows is based on WinHTTP. The original HTTP stack in .NET Framework 4.6 did not support HTTP/2.0 protocol. If HTTP/2.0 protocol is needed, there is a separate NuGet package, System.Net.Http.WinHttpHandler which provides a new HttpClient handler. This handler is similar in features to HttpClientHandler (the normal default handler for HttpClient) but will support HTTP/2.0 protocol. When using HttpClient on .NET Core runtime, the WinHttpHandler is actually built-in to HttpClientHandler. But on .NET Framework, you need to explicitly use WinHttpHandler.
      • Regardless of whether you are running using .NET Framework runtime (with WinHttpHandler) or .NET Core runtime using HttpClientHandler (or WinHttpHandler), there are additional requirements in order to get HTTP/2.0 protocol working on Windows:
      • The client must be running on Windows 10 Anniversary Build (build 14393 or later).
      • The HttpRequestMessage.Version must be explicitly set to 2.0 (the default is normally 1.1). Sample code:

        var handler = new WinHttpHandler();
        var client = new HttpClient(handler);
        var request = new HttpRequestMessage(HttpMethod.Get, "http://www.example.com");
        request.Version = new Version(2, 0);
        
        HttpResponseMessage response = await client.SendAsync(request);
ciel commented 7 years ago

I'm still very confused about what is causing this problem in the first place. You guys are really talking over my head.

I care pretty greatly about the version numbers matching, but if that has to go in order to make this problem stop, I'll deal with it.

Didn't I read an article a few months ago about how most of the System.Net.Http library was being sunset in favor of a rebuilt one?

On Dec 15, 2016 3:18 PM, "Karel Zikmund" notifications@github.com wrote:

@robertmclaws https://github.com/robertmclaws We do not think that the version change would make a difference. Most people (app and library owners) typically just "hit upgrade to latest" on their packages, they don't care how much the version changed (minor vs. major). So the breaking impact will be exactly the same. It is even worse that the "breaking" effect will come to light only when you combine multiple things. That's why this issue slipped through in the first place -- we do not have testing for that combo (and I would argue that's ok - one can't test all combinations, but we can discuss it in post-mortem later).

I don't think anyone cares too much about whole framework groups versions either. Honestly, if I believed it would help majority of customers, I would vote for just changing the numbers -- I just don't believe it would help almost at all. It would just change our message to "yeah, you're broken - that's what the version says, you just didn't realize what kind of things you agree to when taking the version", which is IMO lame excuse.

Given the difference in opinions, I am interested what others think: please use upvotes /downvotes on this post:

  • 👍 if you agree with me, i.e. you think shipping the breaking change as 5.0 won't make a difference and most people will still be broken, confused and impacted.
  • 👎 if you agree with @robertmclaws https://github.com/robertmclaws's proposal, i.e. you think shipping the breaking change as 5.0 will help people understand upfront they should rather stay away from the package, because it will break combo with another library and will prevent unnecessary pain to developers.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dotnet/corefx/issues/11100#issuecomment-267446604, or mute the thread https://github.com/notifications/unsubscribe-auth/AAavtBRE24SlHsZHCYm5OhPbOGs6MfRzks5rIa68gaJpZM4JsdDX .

dluxfordhpf commented 7 years ago

I concur. I derived some details from the email earlier today, but I still am unsure. It would be nice to see a good description of the internal problem so we understand the proposed solutions.

chadwackerman commented 7 years ago

@karelz I appreciate what you're trying to do here but taking a poll as to what "version 5.0" means is a total waste of everyone's time. It's GitHub socializing not engineering. I'd ship a 5.0 version TODAY that fixes this and ship a 6.0 when you figure out how to set all your little annotations right and/or refactor the code.

Statements like "Most people (app and library owners) typically just hit upgrade to latest" aren't useful. That's how Perl 6, Python 3, Rails 3 & 4, and Node.js 1,2,3,4,5 & 6 managed to splinter things. Let's not follow that lead.

clairernovotny commented 7 years ago

@dluxfordhpf @ciel Unfortunately, this is a hard area to explain. It's all coming from "legacy" Code Access Security that's no longer actively recommended for use.

Here's a summary of what it's purpose is/was: https://msdn.microsoft.com/en-us/library/ee191569(v=vs.110).aspx https://msdn.microsoft.com/en-us/library/dd233102(v=vs.110).aspx

The actual issue has to do as @karelz said with an type of one level of security transparency (by way of being in the desktop framework) trying to derive from a type that has a less restrictive security stance (because the attributes are missing in the OOB version). This is because CAS as a concept doesn't exist within anything other than desktop .net.

In the context of the docs above, look at this section on Inheritance Rules

It describes the rules required for inheritance of types across different security levels.

dluxfordhpf commented 7 years ago

Thanks! I’m familiar with CAS; technicals are wonderful.

robertmclaws commented 7 years ago

Given all of the version number changes between .NET Core, .NET Standard, et. al. in the last year (and shipping new code versioned 4.3 on NuGet that runs on 4.6.2, I understand that Microsoft might not think that version numbers matter. But as someone who singlehandedly manages a very complex application architecture, and ships over 20 OSS NuGet packages, I wholeheartedly disagree.

Hitting "Update All" without checking compatibility is the easiest way in .NET to screw up your app and not know until runtime. Adherence to SemVer is the only way to maintain any sense of sanity. Incrementing the major version communicates that something is going to be broken. When you signal that change, you then get the freedom to do whatever you want to fix it.

gulbanana commented 7 years ago

The proposed fix sounds good to me but I'd just like to comment on the test matrix - use of HttpClient on desktop .NET is going to be a major thing for the foreseeable future. This combo really should be brought under test, even though it's true that not every possibility can/should be tested.

chadwackerman commented 7 years ago

Yeah the test thing sounds like a copout to me too. Add the top 100 packages to your unit test project and make sure your crap still runs. Seems like the kind of basic engineering Microsoft used to do before realizing they could fire the testers and let "the community" sort this stuff out instead. It's just such a terrible and frustrating waste of time.

ciel commented 7 years ago

Because the 'old' QA that brought us Windows ME and Vista were superior.

Also, I'm sure insulting them will get the job done faster.

On Dec 16, 2016 7:46 AM, "chadwackerman" notifications@github.com wrote:

Yeah the test thing sounds like a copout to me too. Add the top 100 packages to your unit test project and make sure your crap still runs. Seems like the kind of basic engineering Microsoft used to do before realizing they could fire the testers and let "the community" sort this stuff out instead. It's just such a terrible and frustrating waste of time.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dotnet/corefx/issues/11100#issuecomment-267597137, or mute the thread https://github.com/notifications/unsubscribe-auth/AAavtAUL3pmMiSRnFBe7DEa0y5AaZoVXks5rIpZIgaJpZM4JsdDX .

ciel commented 7 years ago

Found the team leader no one invites to the holiday parties because he treats people like shit.

On Dec 16, 2016 8:26 AM, "chadwackerman" notifications@github.com wrote:

Calling out bullshit believe it or not does get the job done faster. Otherwise you've got millionaire dev managers who haven't written a line of code in 10 years saying "Wow this Github thing sure is neat. let's hack the upvote/downvote emoji and hold a poll so I don't have to make a decision." As if that's going to resolve anything after man-months of work and six hours of triage meetings internally. It's false social signalling and frankly the kind of b.s. engineers used to call out that brought us quality like the Saturn V rocket. And .NET which was the best language packaging and standard library of anything in this space long before all this online idiocy started.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dotnet/corefx/issues/11100#issuecomment-267604666, or mute the thread https://github.com/notifications/unsubscribe-auth/AAavtEigDK5EqlA4LQVlxB_lfamMgHanks5rIp9-gaJpZM4JsdDX .

chadwackerman commented 7 years ago

Or maybe you found the guy who learned how to unblock his entire team by getting Microsoft to finally act on an issue that was ignored for months.

Microsoft has completely duplicate bug lists and priorities internally. Github is a bunch of social nonsense. They wouldn't take a pull request for this issue anyway so this turns into a customer support thread. The devs did a bad job here and it's okay for them to hear it.

ciel commented 7 years ago

There's a stark difference in the devs hearing that and being an insufferable asshole. If insulting remarks is the only way you can convey a point, then maybe you should stick to berating the people you're at least paying to put up with you.

On Dec 16, 2016 8:34 AM, "chadwackerman" notifications@github.com wrote:

Or maybe you found the guy who learned how to unblock his entire team by getting Microsoft to finally act on an issue that was ignored for months.

Microsoft has completely duplicate bug lists and priorities internally. Github is a bunch of social nonsense. They wouldn't take a pull request for this issue anyway so this turns into a customer support thread. The devs did a bad job here and it's okay for them to hear it.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dotnet/corefx/issues/11100#issuecomment-267606461, or mute the thread https://github.com/notifications/unsubscribe-auth/AAavtHTdxym7pvR9KRomyIT14FVaILLFks5rIqF8gaJpZM4JsdDX .

chadwackerman commented 7 years ago

I'll respond only because Microsoft uses "sort by GitHub number_of_comments" to assist in what @karelz so politely leaked as "figure out funding." You're welcome to use the GitHub emoji scale to validate your social values as well.

Dude came in and said SemVer doesn't matter. It's our duty as engineers to point out how absurd that is. It's a classic case where we need a more senior manager who has actual experience, or a more junior guy who actually knows how stuff impacts the real world. Middle managers playing mayor on GitHub is the real problem here. Now feel free to click thumbs down so we can all get on with our day, thank you.

dluxfordhpf commented 7 years ago

Yeah, this bug has sucked, and I am surprised to see a main path sev1 released. But IMHO the real pain from this is due to the design of NuGet: any NuGet package a project uses has to be manually referenced by all consumers of that project. That is needless duplication and bad design, it sets you up for failure. The synchronization wizard is useless when he basic design is bad. NUGET is why this bug is so painful. Otherwise we would fume, put the nasty workaround in Util, and be able to forget about it. But now we have to be careful in all 18 or so consuming projects, growing every month. THAT is why this bug is so painful – because of NuGet, the fix cannot be isolated to one project, you have to touch everything, and keep on doing it.

Also, I do echo the sentiment that Desktop/Windows .NET is what’s important. I am happy to hear about Microsoft .NET coming to Linux, but the money is on Windows, that is where we should be having the best experience, and I want that running the best.

A continual gripe my team has is “why do we have to download all these packages?” We create a console or ASP.NET project and everything it needs is on the box. You create something more modern and 5 billion downloads.

OK, I’m getting a bit far a-field. Thanks for your time and attention and please let me know if we can help you test/evaluate/check docs for typos/whatever you need we are willing to help.

jahmai-ca commented 7 years ago

The reason why this issue (and dotnet/runtime#17786) have caused so much pain for us is because we moved our cloud services from Azure web roles to Service Fabric services and had to move to netcore/netstandard, but still running in full framework apps.

At first I did the binding redirect workaround which seemed to work for a time, but our devs were constantly breaking something by accidentally reverting an app.config, updating a nuget package and fighting AutoGenerateBindingRedirects (which disabling was it's own nightmare).

Finally, I solved this by forcing the use of HttpClientHandler everywhere. That involved changing our own code, raising issues in third party libraries and even using hacks like reflection and duplicating 3rd party code to work around the issue.

So whatever the solution is, if the new package doesn't support the newer HttpClient / HttpClientHandler then we won't take it. That's not a huge issue because right now things are working. However, the "real fix" needs to come soon after, since I don't want to be blocked on updating even more 3rd party packages that might be moving their code to netstandard and introducing this problem.

karelz commented 7 years ago

@dluxfordhpf ... It would be nice to see a good description of the internal problem so we understand the proposed solutions ...

I tried to describe the problem here: https://github.com/dotnet/corefx/issues/11100#issuecomment-267394198. In nutshell:

@chadwackerman ... taking a poll as to what "version 5.0" means is a total waste of everyone's time. It's GitHub socializing not engineering.

It's not socializing. If you look at the votes, it shows there is not unanimous opinion on the matter. So even if you don't trust CoreFX team (with 10+ years experience with .NET customers) opinion which says "people don't pay attention to major version as we wish", I hope it helps to see that even some MVPs (@onovotny) share that opinion. While other MVPs disagree (@robertmclaws).

@chadwackerman ... I'd ship a 5.0 version TODAY that fixes this and ship a 6.0 ..

It is option [2] I laid out above (with different version numbers). It is something we are exploring in parallel. Given the impact of the solution, it is not clear "yeah - that's obviously the right thing to do, let's go for it".

@robertmclaws ... Microsoft might not think that version numbers matter ... When you signal that change, you then get the freedom to do whatever you want to fix it ...

I didn't try to say we do not believe in the right versioning. We just believe that significant portion of developers doesn't care and wants everything to be fully compatible always - i.e. we cannot do whatever we want even in major version bumps. That is based on our team's years of experience with .NET servicing and shipping NuGet packages.

@gulbanana ... This combo really should be brought under test, even though it's true that not every possibility can/should be tested ...

Agreed, now that we know it is important combo we should add coverage for it. And we should poke at the area a bit more to see if we are missing other similar combos around Http NuGett package.

@chadwackerman ... Add the top 100 packages to your unit test project ...

Funny enough, that is similar to what I have proposed a year ago when we shipped bad Meta-package for UWP due to lack of the right test coverage. It just isn't that easy. To catch the interesting bugs (like this one), you have to also exercise the code. Often times you have to exercise code from both combos in the same test (not in this particular case). Overall pretty darn difficult in the test infra. If you have suggestions and ideas we might have missed, let us know -- let's just spin off a new issue for that topic.

@chadwackerman ... and let "the community" sort this stuff out instead ...

We (I can speak for CoreFX and CLR teams) are NOT outsourcing testing of products to community via GitHub. We hold high bar on quality of what we ship.

(accidental hotkey sent the reply before I finished it ... to be continued ...)

karelz commented 7 years ago

@chadwackerman ... Microsoft has completely duplicate bug lists and priorities internally ...

Not true at least on CoreFX & CoreCLR teams - GitHub is primary bug database for all .NET Core. Other non-open source products ("full" .NET Framework and .NET Native) use primarily internal bug databases.

@chadwackerman ... They wouldn't take a pull request for this issue anyway ...

We would not take a pull request which is not able to address/explain all the concerns above (incl. those the person submitting it doesn't personally agree with). Are there open source projects which would take a quick hack without thinking through all its impact? Maybe if 20%+ of its customers is on the floor ... that is not the case here (yet).

@chadwackerman ... Microsoft uses "sort by GitHub number_of_comments" to assist in what @karelz so politely leaked as "figure out funding." ...

First, number_of_comments is not a metric we pay attention to (unless someone "manually" notices that it goes out of the roof). And it definitely has nothing to do with "funding". We pay attention to anything that has impact on customers -- either number of votes, or voiced customer dissatisfaction (on GitHub, Twitter, blog posts comments, anywhere else). This bug got on my radar as "significant customer dissatisfaction" (another MS employee on this thread raised it up as such internally), that's why I stepped in. We are funding it as much as we can at this moment. The only higher priority would be a security issue or 20%+ of our customers is on the floor without a workaround. BTW: Throwing in insults is not going to help speeding it up or influence the decision making.

@dluxfordhpf ... please let me know if we can help you test/evaluate/check docs for typos/whatever you need we are willing to help ...

Thanks, we appreciate your offer. We will definitely welcome help when we have bits ready (and validated on the repros we have), to validate more end-to-end scenarios. We will want to avoid a situation when we fix just some end-to-end scenarios, while leaving others broken in the same or another way.

@jahmai ... but our devs were constantly breaking something by accidentally reverting an app.config, updating a nuget package and fighting AutoGenerateBindingRedirects (which disabling was it's own nightmare). ...

Thanks! That is good example on clarifying the impact of the issue on dev daily work we need to hear and understand. It helps us understand that the combo with VS tooling makes this a true nightmare to work with. We will take it into consideration when finalizing the release plan and dates (which I will start as soon as we settle on solution).

@jahmai ... if the new package doesn't support the newer HttpClient / HttpClientHandler then we won't take it ...

Again, good feedback to hear. We are still trying to flush out the whole end-to-end fixes & release story, before we fully buy into one solution. Releasing a hack without understanding the impact on other scenarios (like yours) would be a desperate move from us -- we would consider it only, if we don't know how to fix it, or if the proper fix is extremely costly. We are not there yet.

karelz commented 7 years ago

Please let's keep the discussion technical from now on.

Quick update: Solution [3] "Sprinkle Security attributes in System.Net.Http" as described above seems to be costly (6w+), so we re-pivoted its internal implementation details: We are considering to use original Http implementation from 4.0 version of the package + implement the 8 new APIs on top of it as best as we can (some may be technically breaking with workarounds). http2 support and other "new WinHttp stack" goodies won't be available by default, whoever wants them needs to call special constructor (technically breaking change, but hopefully not many folks depend on the new 4.1/4.3 under-the-hood goodies). The cost seems to be significantly lower so far, but we are still chasing down and costing 2 lose ends (APIs), before we settle on the final plan. We will have to make some "interesting" compatibility decisions at least for 2 APIs, but they should not slow down the time to the release of the fix.

BTW: Solution [2] "Undo OOB decision" turns out to have more impact than we originally guessed (e.g. it would break .NET Standard 1.6, creating more long-term mess and explanations). We are keeping it as last resort option at this moment, if the above turns out to be too costly or otherwise unreasonable.

gulbanana commented 7 years ago

if you're currently looking at the edge cases around this issue, this could be worth checking out: https://github.com/gulbanana/repro-netstandard-httpclient

this solution demonstrates that currently in vs2017rc, using netfx and netstandard results in clashing versions of system.net.http. i'm not sure how that relates to the OOB thing.

chadwackerman commented 7 years ago

I appreciate (and frankly admire) @karelz candor here but I'm going to sound the versioning horn one final time.

if you don't trust CoreFX team (with 10+ years experience with .NET customers) opinion which says "people don't pay attention to major version" ... based on our team's years of experience with .NET servicing and shipping NuGet packages ...

I'm not sure precisely which logical fallacy this is but you can't claim a deep nuanced experienced team understanding of versioning in the middle of a massive versioning problem like this.

Hopefully we can at least agree that NOT bumping the major version number and having a breaking change is the worst of all worlds. Yet here we are. With talk of "budget" and "6+ weeks of dev time"... to slap on some attributes. For an outdated trust system that never really worked and that I don't use. Because of neglected issues in the compiler.

Amazon Web Services shipped full .NET Core support for all their services last summer. Their recent update lowers the bar from netstandard 1.5 to 1.3. Meanwhile the Azure team can't even get blobs working.

You guys shipped bits that globally break https web requests, silently during builds and loudly at runtime, and still don't have a plan for solving it four months later. I'll drop this for now because obviously you're working on it -- but if the bigger issues here aren't keeping you up at night...

dluxfordhpf commented 7 years ago

Thanks so much for your candidness and your explanations.

karelz commented 7 years ago

@chadwackerman ... you can't claim a deep nuanced experienced team understanding of versioning in the middle of a massive versioning problem like this. ... Hopefully we can at least agree that NOT bumping the major version number and having a breaking change is the worst of all worlds. Yet here we are. ...

You're making 2 large assumptions:

  1. Assumption: The breaking change is known (aka understood to be breaking) before shipping.
    • That's not the case for this issue -- as is common for class of "unintended breaking changes". Yeah, things sometimes slip through :(. The important metrics are IMO: fast reaction (I agree we didn't do that one right on this issue until now) and decreasing/low frequency of such mistakes.
  2. Assumption: Versioning experts are capable to review every single change (which they are not) and that people in general never make mistakes (sci-fi).
    • In this particular case, the System.Net.Http API plan (the new 8 APIs for Desktop) was reviewed high-level before shipping also by versioning experts. They provided suggestions and guidance on options. Unfortunately, the level of breaking was not recognized upfront by anyone (neither area experts nor versioning experts), therefore the chosen solution lead to these troubles.

I also want to point out that by dismissing our experience in versioning you're basically saying "if you (.NET team) ever make a big mistake (this bug), you are not allowed to call yourself expert even on past history and customer patterns (related area)". That's IMO big hammer and not realistic. People/teams make occasional mistakes (like this one). That doesn't make them unfit to be experts in the nearby areas (or even in the particular area). The key thing is if they can learn from their mistakes and do better in future.

@chadwackerman ... With talk of "budget" and "6+ weeks of dev time"... to slap on some attributes. ...

The cost is not from "slapping attributes", that's the easy part. The costly part is from the open question on option [3]:

https://github.com/dotnet/corefx/issues/11100#issuecomment-267394198: Open question: Can we make the internal dependencies of WebRequestHandler work with new WinHttp-based System.Net.Http?

Turns out, that's quite a lot of work, the internal dependencies are just too nasty :(

robertmclaws commented 7 years ago

I really appreciate all of the discussion on this. Seriously. It's awesome. Thanks for being open on this.

At the end of the day, though, here is the dotnet/corefx#1 issue: You guys rewrote a major part of the stack. The part that is central to almost everything. And you don't have even-remotely-adequate test coverage. It shouldn't have needed experts. It should have had a clearly-documented test case that was written at the time WebRequestHandler was written (or the decision was made not to port to Core), and it should have broken when you guys started monkeying with it.

After > 5 years of shipping .NET 4.0-related stuff, there really is no excuse for not having integration test coverage SOMEWHERE that would have caught this. If you find yourself making an excuse for it, you are not demanding excellence from your team.

If there had been:

...this wouldn't have happened.

If the team:

...this wouldn't have happened.

It's a combination of failures that broke .NET at-large, on a scale that I dont ever recall happening before. But you've also broken a large part of the ecosystem, and is becoming increasingly difficult to fix.

There is a very real and significant cost to this. We had to shut off our CI Server weeks ago and deploy manually, verifying the web.config manually every time. We deploy at least once a day. Hundreds of dollars a week in additional man-hours, not to mention the delays from having to spend that time doing something else.

Again, this is not even taking into account the countless hours dealing with existing flaws in the HttpClient architecture: that isn't disposing of instances properly, keeping TCP connections open, and caching DNS entries too long.

So, you're going to catch some flack for this. And deservedly so.

And based on how "expensive" the fix is, I hope you're putting a tiger-team of several of your best people on it to fix it right, and not putting it on one person's shoulders.

StingyJack commented 7 years ago

... The beatings will continue until morale improves.

Let them fix the mistake they admitted to and not be tied up in tact.

robertmclaws commented 7 years ago

Just so we're clear, I'm not trying to beat a dead horse with my last post. But I've seen too many excuses from Microsoft as of late. "Naming is hard." "Versioning is hard." "People make mistakes." This is the mentality of weakness, and not of a team striving for excellence. I really admire @karelz for coming out here and discussing it. But any MSFT employee needs to stop justifying what happened, and let people vent without feeling the need to spend time correcting the record. This isn't some super-edge-case exception when you use a DateTime or something. It's a colossal eff-up due to more than one person not demanding excellence in their work, and it should be treated as such. The only valid response is: "We effed up. We broke .NET. We're not going to rest until it is fixed." Because that would have been the response 5 years ago when Gu was running the show.

chadwackerman commented 7 years ago

I'm sympathetic to @karelz who only recently took on his position after a long time working on .NET. And he certainly shouldn't have to defend his team. I don't fault the workers here; it's obviously a management problem.

But perhaps it shouldn't be a shock to see circular logic and the kind of double-talk that's widespread at certain layers of Microsoft management not play well in front of the real-world customers who actually use the stuff.

In .NET Core Microsoft SQL Server lost the ability to write unsigned values. Again a neglected issue that's at the top of UserVoice sites going back to 2014. Enterprises do not have the luxury of changing their database schema (especially something as tweaky as unsigned) because a couple program managers can't get on the same page after three years. Meanwhile Redis and SQLite (whatever that is) work just fine, and Scott Hanselman rocks tradeshow demos as if this stuff actually works. There's definitely a widening gap between internal and external realities here and the issues need to be (politely) vented.

dluxfordhpf commented 7 years ago

Naming IS hard. Versioning IS hard. Microsoft has a unique perspective, where they deal with customers from all kinds of industries, ranging from the bleeding edge to the dieing vine. Concepts for development approach that seem “obvious” in gaming are foreign in CRM systems. How you approach problems, what is important and what is not important, vary. And then you add advancing technology, different approaches to problems: DAO vs ADO.NET vs LINQ vs EF to use a common easy comparison. Finally, competition in the Internet server realm, whole new hardware models with turnkey tablets, and the Balmer virus. Whole new ways of thinking and modeling problems with different limitations and advantages.

At one point I worked for a…large famous company. One product group released a product with an updated shared component that interpreted it’s call stack differently depending on how it was initialized. They never tested it with our product, and it made our product go boom every time. Obvious, but they shipped it anyway. We called this the “friendly fire” incident.

I missed a bug once where if we installed to the D drive, on uninstall under certain conditions the product could…delete all files on the HD. We had to burn $40k of pressed CDs.

People aren’t perfect and we do make mistakes. Blame, however, is a useless concept. It makes us feel powerful through destructive concepts like shame and humiliation. Humility and forgiveness are more powerful. That’s how we do better, learn better ways, and sometimes simply why something is important in the first place.

Yes the problem makes me angry too, but it is getting fixed. And you know what – open source problems like this languish for decades before anyway bothers to even try. Just try to Kerberos a Linux system or look into GSSAPI. InitializeSecurityContext/AcceptSecurityContext is SO much easier. Money matters.

robertmclaws commented 7 years ago

At one point I worked for a…large famous company. One product group released a product with an updated shared component that interpreted it’s call stack differently depending on how it was initialized. They never tested it with our product, and it made our product go boom every time. Obvious, but they shipped it anyway. We called this the “friendly fire” incident.

Those were two different products. .NET is a single product. And I'm not trying to blame any one person. I blame the process. It feels like because Microsoft went OSS, they threw the processes that helped them ship the most stable development framework in history out the window. I feel like this would have been caught if it had shipped in .NET 4.6.3 instead of on NuGet.

chadwackerman commented 7 years ago

The clumsy transition to OSS certainly isn't helping matters. I don't blame OSS but Microsoft certainly seems to be proud to make all the obvious mistakes.

I use words like "quality" to describe software but they now use words like "engagement." Running the instrumented compiler that phones home an extra twenty times a day because I have to manually edit .config files is not "additional engagement." But this is the myth of the entire internet these days.

Consider the BashOnWindows team that decided to have a bunch of people with zero Linux experience write a Linux usermode shim live on GitHub. The work is a basic but tedious check off the boxes engineering exercise and there's absolutely no reason to involve community feedback for feature prioritization. But away they went.

Six months into it "the community" had to tell them it couldn't handle files that end in a period. This is not software engineering; it's some sort of weird marketing exercise. Managers who are implicit in the scheme deserve Feedback. And some of it may be unpleasant to hear.

Edit to add: similar deal with the Bash debacle. Shipping broken bits live, weird ties to other components (could only be installed as part of Windows Update), and of course Scott Hanselman somehow doing live demos despite the fact that it couldn't run tar let alone a compiler.

jahmai-ca commented 7 years ago

Some of the rhetoric here smacks of "the good old days" of Microsoft, which is ironic because all the changes we're witnessing now are a direct response to customer demand for Microsoft to evolve.

We were sick of closed source and draconian reverse engineering (de-compiling) rules, so we got references sources and now open source.

We were sick of .net Framework bugs that affected every application installed on a machine and took a long time to fix and required enterprise distribution and IT departments to qualify every .net Framework version update before it could happen, so now we're moving toward shipping nuget packages with our apps so we can push the fix as soon as it is available.

We were sick of Microsoft Connect shutting down every other bug as "working as designed" or taking 6 months to even schedule for a release, so now we have Github projects managed much more closely by the team who actually writes the code, and the whole community can easily give their 2c on every community raised work item.

We were sick of Microsoft taking every good idea in the community and making a proprietary clone that didn't play well with non-Microsoft tools, so now we get to use NPM, Gulp, NodeJS, etc

We were sick of C# only being viable for Windows, and projects like Mono struggling to even keep par, while having to work around bugs or lack of functionality with #ifdef everywhere, so now we have Xamarin propelling Mono development with reference sources and allowing us to code latest C# language and share the same code base on .net, UWP, iOS, Android and netcore.

All of this is happening at once because it has. We're not going to wait for Microsoft to play catch up while it ships perfect stable pieces of functionality that only solves half our problems at any given time.

My issue isn't that all this change is happening at once. Nor that there are issues like this making an appearance. The real issue for me is that it took months for this issue to even be recognised as a major issue and it's taking even more months to figure out how to fix it.

chadwackerman commented 7 years ago

Three weeks and no update. Happy New Year everyone.

Edit to add this quote from @karelz since understatement seems to trigger a slightly different group of special snowflakes here:

I will post updates (hopefully with more technical details) when we have them, at least on weekly basis.

pollax commented 7 years ago

@chadwackerman seriously, do you understand that your bad attitude won't help this issue getting fixed faster? And apart from that, you are making a fool out of yourself here. Please improve your tone.

robertmclaws commented 7 years ago

What tone? He made a statement, which has the benefit of being accurate, then wished everyone a Happy New Year. If you took it negatively, that's your issue.

And regarding that first statement: this is a major breaking issue that doesn't show up until runtime. Five years ago ScottGu or Rob Howard would have had a team on this 24-7 till a fix shipped. Now it's just "you know, we'll get around to it."

You know what will make people happy? Fix the problem. Otherwise, there are some pissed customers out there, and some of them are on this thread. They have every right to be pissed. So find something else to do with your indignation, @pollax. You have not contributed anything meaningful to this thread, and no one anointed you the GitHub Thought Police. You also haven't wasted > $5K of your company's money on this problem.

pollax commented 7 years ago

Ok, maybe I read to much into it and because of the tone he used in previous comments, I read this last one with a slightly ironic tone. Sorry about that. Regarding the issue; I hear you. I'm troubled by it myself (otherwise I wouldn't be watching this issue so closely) so please don't make assumptions about what I have/haven't wasted in terms of money/resources on this. But as a developer, I know that the worst thing is to have someone staring at you when trying to fix a major issue.

dluxfordhpf commented 7 years ago

I concur with your read of sarcasm and agree, it is not productive. I also concur with the feeling of “when is it gonna be fixed,” though. Unfortunately this main-path CAT1 issue just didn’t attention until many people got quite vocal about it. I hope there is some process improvement internally to resolve this; it would suck for us all if the only way we can get fixes is via curse words.

niemyjski commented 7 years ago

I'm also getting a report of this happening on full framework with our Exceptionless nuget package. Any solid fixes or work arounds?

erdtsieck commented 7 years ago

I also get this self hosting ASP.NET Core from a Windows Service running full framework 4.6.2. dependency NETStandard.Library 1.6.1 forces me to use System.Net.Http 4.3.0.

enyim commented 7 years ago

dependency NETStandard.Library 1.6.1 forces me to use System.Net.Http 4.3.0.

and this is my biggest problem with this whole situation

more and more packages are taking a dependency on the meta-package, forcing me to upgrade every dependency I have (or fight an uphill battle of downgrading/ignoring specific ones)

this goes contrary the previous promise of "mix and match" of system dependencies

I had to downgrade my system to 4.5.2 from 4.6.1 (& losing significant time), because every second package brought in .net 1.6 and its buggy System.Net.Http

If these were only 3rd party packages, well, so be it, I can go around them, pester the maintainers to use more granular dependencies, but most of my deps were from MS, and I'm expecting MS to use granular dependencies, not just slap .net 1.6 into the nuget file

karelz commented 7 years ago

I planned to send an update for last few days, so here it is:

@CIPop's work was preempted by a security issue. His work was picked up by @davidsh. @davidsh plans to submit PR early this week (i.e. any day now).

Here are details on our execution plan & status:

High-level plan: A. Replace WinHttpHandler with HttpClientHandler in net46 build of CoreFX B. Implement 8 new APIs on HttpClientHandler we introduced in 4.1.0.0 OOB package

Execution plan:

EDIT 2017/1/12 - the execution plan was moved to the top-most post https://github.com/dotnet/corefx/issues/11100#issue-173058293

karelz commented 7 years ago

@niemyjski Any solid fixes or work arounds?

There's a workaround to downgrade the offending package to 4.0.0.0 (see https://github.com/dotnet/corefx/issues/11100#issuecomment-246469893), unfortunately per comments above, any edit to NuGet dependencies will bring it back up -- which is the key reason why this issue is so painful.

I hope there is some process improvement internally to resolve this; it would suck for us all if the only way we can get fixes is via curse words.

I have some ideas (and questions) about the process improvements here, to prevent future important impactful issues to go unnoticed for so long. However, I am intentionally deferring that discussion AFTER we have a fix shipped.

dluxfordhpf commented 7 years ago

Woot!!!

georgiosd commented 7 years ago

Well done guys @karelz

gulbanana commented 7 years ago

Thanks for the update. PlatformNotSupported would be better than nothing since they're new apis that legacy software doesn't rely upon.

gulbanana commented 7 years ago

It sounds like post-fix, unification through binding redirects will still be required, but a redirect to the newer rather than the older assembly, which nuget can generate correctly?

davidsh commented 7 years ago

Now that PR dotnet/corefx#15036 is checked in, this problem should go away w.r.t. net46. System.Net.Http.dll is now using built-in HTTP stack from .NET Framework. This means it has 100% compatibility with WebRequestHandler.

Please try out the latest packages on the MyGet dev feed. You'll want to use the latest built System.Net.Http.dll packages which as of today is: https://dotnet.myget.org/feed/dotnet-core/package/nuget/System.Net.Http

version 4.4.0-beta-24913-02.

You can use dev feed packages via changing your NuGet package source feeds either with command line tools or Visual Studio.

Instructions:

Command-Line: Add the following in nuget.config, under element:


<add key="myget.org dotnet-core" value="https://dotnet.myget.org/F/dotnet-core/api/v3/index.json" />

Visual Studio: In VS, Tools->Options->Nuget Package Manager->Package Sources -> Add, under Source, use this url, https://dotnet.myget.org/F/dotnet-core/api/v3/index.json

In either case, make sure you list the MyGet dev feed as the first feed in the list.

davidsh commented 7 years ago

And to summarize the fix in dotnet/corefx#15036, we ended up with 100% app-compat with the original .NET Framework System.Net.Http 4.0 API surface and 100% app-compat when using WebRequestHandler.

In terms of the level of support for the newer properties added to HttpClientHandler (that are part of .NET Core 1.1 and beyond), the following summarizes the expected behavior of these newer properties when running on the 'net46' target:

1) public bool CheckCertificateRevocationList Throws PlatformNotSupportedException

2) public X509CertificateCollection ClientCertificates Implemented

3) public ICredentials DefaultProxyCredentials Implemented

4) public int MaxConnectionsPerServer Implemented against the current ServicePoint logic

5) public int MaxResponseHeadersLength Implemented

6) public IDictionary<string, object> Properties Implemented

7) public Func<HttpResponseMessage, X509Certificate2, X509Chain, SslPolicyErrors, bool> ServerCertificateCustomValidationCallback Implemented except that first parameter is always null due to current inability to map internal HttpWebRequest to HttpRequestMessage

8) public SslProtocols SslProtocols Throws PlatformNotSupportedException

dluxfordhpf commented 7 years ago

Woo hoo! Thanks guys!

karelz commented 7 years ago

Did anyone get a chance to validate @davidsh's bits? We would really welcome some end-to-end validation on the 7-ish scenarios that were hinted on this thread.

We were able to validate @onovotny's simplified repro so far. Would be great to get couple of more confirmations on repros we do not have locally. Thanks!

Once we have that, we will move towards porting the change into 1.1 branch and release a patch -- hopefully next week.

dluxfordhpf commented 7 years ago

I will get some test done this week. I just got pulled into an escalation so I can’t start for 1-2 days, at a guess.

karelz commented 7 years ago

@dluxfordhpf thanks! Appreciate your help.