Please consider the impacts of banning HTTP

jhourcle commented 9 years ago

On the website for the HTTPS-Only "Standard", there is a statement that 'there is no such thing as insensitive web traffic' -- yet there is. Due to many institutions having policies against FTP and peer-to-peer protocols, HTTP has become the de facto standard in sharing scientific data.

Much of this traffic is regularly scheduled bulk downloads through wget and other automated retrieval tools. Forcing these transfers to go to HTTPS would cause an undue strain on limited resources that have become even more constrained over the past few years.

Although some data transfers are handled through rsync or scp with the High Performance extensions to SSH ; this requires coordinating between each pair of institutions, and requires additional paperwork to track accounts. This additional paperwork requires collecting Personally Identifiable Information (PII) that is not necessary when distributing data over protocols that do not require authentication. This paperwork has restricted our ability to openly share scientific data with international partners, and has impacted our sharing with an NSF partner that had a foreign-born system administrator.

In 2013, the White House Office of Science & Technology Policy issued a memo on "Increasing Access to the Results of Federally Funded Scientific Research". This document requires that many agencies must create plans to make their scientific data available to the public in digital formats. Even if agencies were to lift their restrictions on peer-to-peer protocols, they don't scale to track the 100M+ files that larger projects manage.

The qualities that improve the privacy of HTTPS connections are a hindrance in bandwidth management. Caching proxies can be used with HTTP to prevent multiple users from a given site each having to download a file directly from the source. This is especially important for Near Real Time (NRT) data in which many sites poll for data, or when remote websites directly embed image links.

This is also important for times when something makes is announced in the news. We have had times when we've had to remove 'branding' type images from our webservers to reduce bandwidth. Even with the crush of international traffic, we have been able to withstand torrents of requests that were multiple orders of magnitude higher than our typical traffic. It is rare for us to know in advance when our data will be newsworthy, nor what data specifically will be of interest.

The effect on those without computers at home

This increased bandwidth is not only problematic for those serving the data, but may also be an issue for those trying to use the data. Schools and libraries will frequently install proxy servers to both minimize their bandwidth consuption through caching, but also to prevent students and patrons from going to inappropriate sites. In some cases, this filtering is mandated by state or local laws. To comply with these laws, some institutions block HTTPS entirely.

Larger libraries may have a means to request unrestricted access, but typically require a separate request each time, to prevent patrons from looking at materials in eyeshot of children.

As such, requiring HTTPS may make it harder for such institutions to provide the same level of service while complying with their local laws. To enable scanning of HTTPS, they could configure their proxies to act as a man-in-the-middle attack, removing all privacy when the citizens had otherwise expected it.

Restricting the use of HTTP will require changing a number of scientific analysis tools.

The overall funding cutbacks of the past few years has led to decreased funding for maintaining scientific analysis software. A good portion of it is stable and not under active development.

Many of these packages retrieve data using HTTP. Should that access be removed, someone will have to adjust the packages to retrieve data using HTTPS or some other protocol. Additional work may be required in the future to deal with any future patches to the SSL libraries, whereas older HTTP protocols can still be supported by modern web servers.

This may be even more of a problem for software used on currently running missions. In one case that I was involved with, one of our sources for schedule information changed over to require SSL. We attempted to get the necessary software running on the machine that was used for commanding, but after days of effort, we gave up. Instead, I have a cron job running on another system to retrieve the schedules over HTTPS, and then have the system pick up the file from our local server using HTTP.

For other missions that have to go through change control, re-certifying their workflows to use HTTPS could be a rather significant cost now and in the future to deal with SSL patches.

HTTPS is not a security improvement for the hosts.

Although HTTPS makes it more difficult for an ISP to sniff someone's web browsing, this feature should be properly weighed against all of the security issues, as it can actually increase the attack surface and cause other problems:

Flaws in SSL/TLS implementations have made it possible for third parties to dump information about acitivity (Heartbleed).
Flaws in SSL/TLS can create a false sense of security (POODLE, FREAK)
HTTPS would hide attacks from existing intrusion detection systems.

Many of the issues are specifically related to SSL3, but most servers still enable this by default. It is quite possible that if forced to meet a deadline, sysadmins may be rushed and use the defaults, making their systems less safe.

Should there be a remote exploit, this could invalidate any of the privacy improvements that may have been gained from switching to HTTPS.

In the past, we have taken servers offline as a preventative measure against zero-day exploits that we could not conclusively prove that we were immune to or could mitigate, as the alternative is multiple weeks of rebuilding the server from the ground up and re-certifying it for use should an exploit occur, or be suspected of having occurred. As such, anything that increases the attack surface can decrease the availability of the services that we provide to the public.

HTTPS is frequently implemented improperly

We also have the case of poorly implemented HTTPS systems in the US federal government. I have stopped keeping track of the number of certificates that I have encountered that are self-signed, signed by an unknown CA, or expired. If certificates are not maintained, we risk desensitizing the public to the issue and training them to automatically trust the certificates without the appropriate level of scrutiny.

If webservers are not properly managed and insecure ciphers removed as exploits are found against them, HTTPS may only offer the illusion of privacy while leaving citizens vulnerable to monitoring from ISPs or from other participants on unswitched networks such as wifi.

Just yesterday, I was given instructions for our new voicemail system, which stated after giving the HTTPS URL to access it:

Note: You may receive a certificate error. Select "continue" to proceed to the site.

With the relaxing of rules regarding disclosure of information to third parties, government websites were allowed to use services hosted externally. If these services don't support HTTPS, we risk serving 'mixed' content back to the user -- which means we either train them to ignore the warning if they want access to our content.

In conclusion:

Websites that use authentication or have personally identifiable information about users of their systems should use HTTPS. There may be other sites in which it would be appropriate for them to use HTTPS, but there are still situations for which HTTP is a better choice.

In summary:

Non-sensitive web traffic does exist.
Moving to HTTPS has a non-trivial cost.
HTTPS will reduce the availability of government information and services.
HTTPS increases the risk for the maintainers of the servers.
HTTPS is already implemented and broken on many federal websites.
HTTPS may only offer an illusion of privacy while still being insecure.
This proposal risks increasing the 'digital divide' for citizens who access the internet through schools, libraries or other filtered access points.
This proposal risks training citizens to ignore security warnings from badly configured or maintained websites.

Although HTTPS may be better for citizen privacy, it actually increases the risks for the maintainer of the servers and can require significant changes in network architecture to mitigate those risks. There is a non-trivial cost in banning HTTP, which could adversely affect the distribution of information to the public.

Joe Hourclé (joseph.a.hourcle@nasa.gov) Webserver & Database Administrator Solar Data Analysis Center Goddard Space Flight Center

dstufft commented 9 years ago

Sensitive means more than "do I care if other people can see what I'm downloading", there's also "do I care that I'm getting the data that I'm actually asking for". Presumably scientists and consumers of this data care that they are getting the real data?
Personally, I feel like a better answer to "something has previously been done badly" is to figure out a way to enable people to do it better, instead of just writing it off completely.
If the schools, libraries, and other filtered access points own the client machines they can install their own CA into those machines and have the proxy software intercept and generate certificates.
Ideally deploying HTTPS on government websites will come with also deploying HSTS which makes it impossible for a user to click through the warning.

noncombatant commented 9 years ago

I'm afraid almost everything in your comment is misguided, factually wrong, or betrays an ignorance of better ways to do things.

For example, why not use anonymous, tightly-restricted rsync? http://serverfault.com/questions/343668/rsync-with-ssh-keygen-to-ssh-user-with-limited-commands-and-specifc-directory

Protonk commented 9 years ago

If Wikipedia can manage the transition, I'm pretty sure NASA can too.

ncoghlan commented 9 years ago

Before the techies start weighing in too heavily here, I suggest reading http://www.wired.com/2015/04/megan-smith-civic-tech/

The US government IT infrastructure is vast, and much of it is a long way behind the times, as much of the tech industry has been more interested in either railroading government by fighting it in the courts or else charging for software upgrades rather than including them as a standard part of sustaining engineering subscriptions.

Institutional infrastructure engineers quite rightly care about the experience of their users, and its going to take time to modernise client and networking infrastructure sufficiently to reasonably enforce the use of HTTPS everywhere for government services.

noncombatant commented 9 years ago

Yes, but that is not an argument against starting immediately to make the change. (No time like the present!) It's pretty reasonable to read jhourcle as decrying the change outright.

ncoghlan commented 9 years ago

Right, jhourcle's actual disagreement appears to be with this policy statement on https.cio.gov: "An HTTPS-Only standard will eliminate inconsistent, subjective decision-making regarding which content or browsing activity is sensitive in nature"

That policy statement is entirely correct - we've learned from experience that folks providing a service (whether commercial or public sector) tend to err on the side of declaring their service "not sensitive", and its the end users rather than the service provider that suffers the consequences when that assessment turns out to be incorrect.

https.cio.gov sensibly changes the default to be "assume your service is sensitive, and if you don't want to secure it accordingly, make the case that it's not sensitive". The service providers can then make the decision on whether it's more appropriate for them to either just secure the service, or else make the case for an exemption from the policy.

konklone commented 9 years ago

@jhourcle Thank you for a detailed, thoughtful comment that speaks to your professional experience working in the federal government. As a fellow government employee responsible for getting applications deployed and working, I know HTTPS can represent additional engineering tradeoffs and ways for things to go wrong. I also know that, as you point out, there's a lot of support software out there that needs some work to adapt to the way the web is moving. The scene needs to get a lot easier. I hope the work here will contribute to that, and I personally think your comment is going to be very helpful in that regard.

jhourcle commented 9 years ago

My objection is that this proposal bans HTTP, which I believe will cause a number of problem that I've already outlined.

To respond to those who have commented:

@dstufft : (1) I would propose sharing checksums over a secure side channel, while continuing to use HTTP for the data payload. This allows you to validate the content without needing to encrypt the full payload, giving you integrity without sacrificing availibility. (2) I'm all for an easier way of managing the problems. I'm still against banning HTTP, though. (or banning any protocol outright). (3) And who is going to fund the thousands of school districts and library systems to make this change? It's much more likely that they won't do anything, blocking citizen's access to federal information until they can upgrade their systems. (note: I both volunteer at my local library, and am well aware of the local funding issues as I was an elected official until last January when I decided not to run for re-election) (4) I believe you are overly optimistic in thinking that everyone would be running up-to-date clients.

@noncombatant : If you find one thing wrong with my statements (which are as I understand the situation, which as a human, will always be less than the whole), it does not invalidate any of my other statements.

You've cited one item that you claim is wrong, and yet you point me to a proposal that requires an O(n^2) solution that I specifically stated we're trying to avoid. You link to something that discusses ssh keys, which our office considers to be a login, and therefore requires significant paperwork. Rsync specifically has a problem with the file format that we use (FITS, the Flexible Image Transport System) which has a metadata block at the beginning of the file. Should the metadata grow past a multiple of 2880 bytes, everything gets shifted in the file and rsync sees it as the whole file changing. See section IV subsection E from (Distributing Solar Data: Minimizing Wasted Bandwidth)[http://dx.doi.org/10.5281/zenodo.16950] for an explanation as to how to deal with the issue using HTTP. Of course, this would get blocked if rule id 958291 was enabled in mod_security.

@Protonk : I don't believe that Wikipedia is actively supporting 20+ year old missions, and running older systems specifically to support it. We were still running VAXes in our group until a few years ago (others still might). When we worked on a javascript heavy redesign of a search interface, we tested it on ~5 year old hardware and 3+ year old web browsers so that we could be confident that it'd run well for scientists from emerging countries and school children in antiquated computer labs. (and that it would still function without javascript).

@ncoghlan : Agreed, on the old infrastructure (see above), but I've also been stung by too many top-down 'solutions' that they try to force down our throats when they don't actually understand what our needs are. I've seen it happen in both government and when I worked for a university where someone convinces higher up management to buy services / products from their company, and it's a waste just a hole to pour money into. At the university, a 'peer review' of a webserver cluster that we were ~3 weeks from putting into production turned into them teaching us CMM (which got aborted when I showed up to the training w/ my use cases), then a 'vision document' as if we were trying to compete with ISPs, and then into a spec for a two node cluster running 2 databases, 3 different webserver products, and 3 text-preprocessors, and then a contract to build said server. What they got was a bunch of grey market hardware with no drives in them ... after wasting a good man-year of time as they dragged things out so they could suck as much money out as they could.

On that whole 'top-down' approach -- my boss sent me to the meeting for the roll-out of data.gov version 2 -- during the session on APIs, they asked for a show of hands of who was management vs. IT. I was the only IT person in the room. Way too many of these 'solutions' are decided upon without consulting the people who are going to have to implement & support them. We've had our lead sysadmin talk about quitting because of how much of her time is spent doing paperwork, which would likely lead to all of the rest of our group qutting as well. More unfunded mandates mean more work without an increase in staff ... which leads to burnout, quitting, and major loss of tacit knowledge.

... and I referenced the statement that I took offense to, as I believe this statement is being used to prop up this push to ban HTTP. From the 'Why HTTPS for Everything?' page ( https://https.cio.gov/everything/ ) :

Today, there is no such thing as insensitive web traffic, and public services should not depend on the benevolence of network operators.

You also seem to think that there's a way to be exempted from this policy. I've seen nothing that would indicate that, and my complaint is specifically because I think that there should be a waiver process. We're already filing waivers annually for all of our servers not running HTTPS, as we have to document each control in the CIS benchmarks that we're not in compliance with.

@konklone : I'm actually less concerned about the deployed and working than in long term maintenance. It's pretty typical to get support initially, but then resources are re-assigned once things are rolled out, leaving departments and projects to deal with the increased workload on their own. Consider it like technical debt -- up-front costs may be insignificant when you calculate the reoccurring costs

I'm also concerned not just for my systems, but all of the ripple effects of how many things are going to break, how they're going to break, when someone's going to notice they're broken, how much effort it will be to fix them, etc. I'm currently relying on an homebrew HTTP/1.0 client in IDL because I need to examine the Content-Disposition header so that I can write out files with useful names. (and HTTP/1.0, because the support for chunked encoding was flawed, so when lots of sites started banning HTTP/1.0 due to the CIS recommendations, our stuff started failing; the maintainer started claiming HTTP/1.1 to get around it, but if a server did use chunked encoding, the files were corrupted. We think we have it fixed, but because projects have to re-certify their data pipelines)

Many of these science tools were written at the beginning of missions that have since ended, and there is no funding to maintain them. I don't know if a single test suite to validate if they're still behaving properly should we modify them, other than those from the project pipelines ... and those typically aren't portable, and wouldn't necessarily cover each package fully.. (which is itself an integrity problem, I know)

...

So, in conclusion, physicists may be obnoxious, but IT folks are often guilty of the same thing when they forget about the possibilities of edge cases and try to lump everything into one convenient group.

...

And please don't get motherly about the solutions that you're supporting, or you'll end up being dismissive of anything that might contradict your world view. I've gone out of my way to explain where I believe there will be problems if we have to move everything to HTTPS. Being dismissive of people's concerns will not gain you any support in the long run.

ps. The only reason I even knew about this was because of a post on Slashdot a month ago ... I've heard NOTHING through official channels. It took the stupid instructions on using our voicemail to finally prompt me into writing up my complaints (and then going through my ATR and team members to make sure that I wasn't mis-representing anything or leaking too much info)

(and um ... I didn't clear this response through them)

larrysalibra commented 9 years ago

If whatever tool or app you're using really can't use https, you can set up a local proxy server that strips https in less that time than it took you to write this post. The rest of the world's privacy isn't worth sacrificing because you can't be bothered to spend a few minutes reading nginx documentation for how to set up a reverse proxy or one of the many similar tools. You can even continue to cache it locally if you so desire.

I used to work in NIH-funded research at a VA hospital - never ceased to amaze me how people like @jhourcle would spend more effort avoiding a change than the effort making the change required.

@jhourcle writes:

Schools and libraries will frequently install proxy servers to both minimize their bandwidth consuption through caching, but also to prevent students and patrons from going to inappropriate sites. In some cases, this filtering is mandated by state or local laws. To comply with these laws, some institutions block HTTPS entirely.

Good. Hopefully this will show citizens of those states will see how ridiculous the internet censorship laws they passed are and convince them to pressure their representatives to overturn them.

@ncoghlan writes:

Right, jhourcle's actual disagreement appears to be with this policy statement on https.cio.gov: "An HTTPS-Only standard will eliminate inconsistent, subjective decision-making regarding which content or browsing activity is sensitive in nature"

This is by far the most insightful part of the CIO's policy suggestion.

alex commented 9 years ago

@larrysalibra Please keep it civil.

haneefmubarak commented 9 years ago

The qualities that improve the privacy of HTTPS connections are a hindrance in bandwidth management. Caching proxies can be used with HTTP to prevent multiple users from a given site each having to download a file directly from the source. This is especially important for Near Real Time (NRT) data in which many sites poll for data, or when remote websites directly embed image links.

This is also important for times when something makes is announced in the news. We have had times when we've had to remove 'branding' type images from our webservers to reduce bandwidth. Even with the crush of international traffic, we have been able to withstand torrents of requests that were multiple orders of magnitude higher than our typical traffic. It is rare for us to know in advance when our data will be newsworthy, nor what data specifically will be of interest.

There are in fact CDNs that are fully capable of caching content, such as CloudFlare and Incapsula. These services operate by recognizing which parts of the page tend not to change and caching those portions intelligently across requests. As a result, your web servers are hit with way fewer requests than the number of people actually loading your site(s). This works because these sites are essentially massive, intelligent caching reverse proxies (that are distributed across multiple data centers) with HTTPS support (plus other stuff of course).

[Disclaimer: I have no affiliation with CloudFlare, Incapsula, or the Squid Cache project other than my use of CloudFlare and Squid]

I personally use CloudFlare for my rarely updated blog (which does in fact use HTTPS while running on a home server with a 1 MB/s uplink), and can honestly say that their service works effectively. During traffic spikes, my site still loads quickly without overloading my server.

As for NRT data, while I can't say that I've ever had RT data, I believe that with CloudFlare's enterprise plan (mind you, I haven't used this personally so this is conjecture - I've never really had a need for a pad plan insofar) supports a cache expiry of 30s, which should allow academic websites delivering NRT data to still be able to deliver their data relatively quickly while still being afforded the benefits of a caching proxy.

Alternatively, for institutions that prefer to maintain their own proxy, Squid appears to be able to reverse proxy HTTPS websites too.

infinity0 commented 9 years ago

To expand on @dstufft 's first point, HTTP as commonly used in web browsers, allows javascript injection attacks against your users. Even if your website doesn't have any "sensitive information", the attacker gains the ability to abuse your users for their own purposes. This is how the recent massive attack on Github was carried out - someone, ahem, attacked plaintext HTTP advertising traffic on seemingly innocent / unsensitive websites, to turn those users effectively into a massive botnet for DDoS against Github.

@jhourcle how do you suppose "a secure side channel [for] checksums" would be implemented, and why would it be less costly than simply using HTTPS?

haneefmubarak commented 9 years ago

To further expand upon what @dstufft and @infinity0 have said, adding a "secure channel [for] checksums" means creating a new protocol or a protocol extension, which will involve lots of time, money, and human resources to both create and deploy.

@jhourcle you ask who would pay for the systems in schools, libraries, and workplaces to be fixed. Well, the problem here really is that HTTPS is being blocked from certain networks - removing such a simple block should be rather easy and cheap (time, money, human resources) to implement. Excessive violence and pornography filters that do not require blocking HTTPS exist and are not expensive (cheap enough that even individuals, such as parents, are able to purchase them for personal usage). Additionally, using older clients in and of itself can present open exploit vectors, as browsers such as Firefox and Chrome continuously improve upon their security all the time as new issues are found.

As for ensuring that libraries and schools do allow HTTPS traffic, a federal mandate requiring schools and libraries providing internet access to ensure that there is full, unfiltered access to federal websites and resources could work. This would be easy to implement from a technical perspective, as said schools and libraries could simply whitelist all *.gov domains to allows HTTPS traffic.

immibis commented 9 years ago

What's the reasoning for not serving both HTTP and HTTPS?

lorenzogatti commented 9 years ago

Not fixing SSL vulnerabilities, letting certificates lapse, not implementing SSL at all are technical debt that administrators like Mr Hourclé got used to consider acceptable; but ISPs cannot be trusted any more and time has come to pay the debt or be left behind.

alex commented 9 years ago

@lorenzogatti Please be respectful. We can have a conversation about this topic without resorting to personal attacks.

jhourcle commented 9 years ago

I am still amazed at how simple other people's lives must be, if there is one solution that works for everything that they do.

So, a few points:

I never said that the side channel wasn't HTTPS. It could also be XMPP over TLS for NRT data, ir it could just be signed and served over HTTP. You have two possible approaches, as I see it. The first is that all of the finding is done over a secure channel, and then as a result of the ordering phase, you're given the checksum + the HTTP URL for retrieval. The result of the ordering could be metalink, or BagIt with a fetch.txt file.

The second option (which I suspect has some issues) would be to have the unsecured file contain a reference back to a HTTPS URL of where to get the signature to validate the file. This would be easy within FITS, or you could use XMP to keep it attached to the file. You could also pass the information in HTTP headers via Web Linking if you defined an appropriate relationship type. The drawback to the web linking approach is that you lose the relationship after the file's been downloaded; basically the same arguments that we made in Appendix A of Achieving human and machine accessibility of cited data in scholarly publications.
I suspect that most people have not dealt with IT in schools and libraries. I've known some great, knowledgeable people over the years, and most are now either dead, retired or have moved on to industries that pay better or value their contributions more. With local government wages being fairly flat since the great recessions, I'm guessing that of the pockets of great people still out there, they're overworked as getting all of your stuff done on time means not needing to hire someone else to take some load off you so you can decompress once in a while. More likely you have organizations that have little in-house talent and just contract everything out, or they have a revolving door for talented people, or they end up collecting all of the dregs.
People keep assuming that I'm serving HTML. Yes, we do have web pages on some of our servers, but the vast majority of the information that I serve is not HTML. I have a number of servers that contain non-browsable repositories of data, and am hosting search APIs so that people can find data of interest. A number of my web servers serve no (X)HTML. I'm not currently aware of any exploits to insert javascript into FITS files or SOAP services. The only attack vector that I can think of would require a MITM to replace the files with 404 errors or similar ... but the majority of the clients can't render HTML, so it wouldn't get you very far.
As for the recommendation to use a CDN -- As I stated in my original post -- I have no idea what data will be of interest until it happens. If you had followed the links that I had included, you'd know that one collection is growing by roughly 1TB/day. If you know of any CDNs that can host ~3PB of data at rate that won't dramatically increase our budget, I'd be glad to hear it ... Amazon Glacier's pricing is $0.01/GB, which works out to about $10k/PB/month. At 1TB/day outbound, that's another ~$2.7k/month. And all for 3+ hr waits when someone requests data, which would significantly impact our availability. (and we'd likely have to keep 1-2 months in higher availability storage, plus whatever space is needed for staging the tape loads so that we can run the processing necessary to re-attach the scientific headers.
People who suggest running a local HTTP to HTTPS proxy fail to realize that this would mean that everyone who accesses our data would have to install and maintain this software. This would magnify the costs while shifting it to thousands of scientists and others who consume our data.
We can't serve both HTTP and HTTPS because of the proposal to ban HTTP on US government servers.

(and yet again, no one approved the posting of this response ... and I think that's also a policy against posting things to public forums ... so it's a good thing that I'm doing this from home. And I'm going to stop looking at this now ... I have two talks that I need to finish for next week, and if I keep responding to people who are dismissive of my concerns, this is going to turn into a real time-suck for me).

canweriotnow commented 9 years ago

So basically... those who would sacrifice transport-layer liberty for transport-layer security deserve neither?

DavidJFelix commented 9 years ago

Things are hard. We shouldn't do them.

cayblood commented 9 years ago

The dismissiveness of many of the responses to @jhourcle here are lamentable. He doesn't seem to me to be primarily complaining that https-only makes things difficult. Rather, he seems to be saying that https is not always the best choice when you consider the nature of the data being shared and the consumers of said data.

hayesr commented 9 years ago

As an example, we're really feeling this right now in school districts. We have old equipment and lower-bandwidth connections. Proxying allowed us to have 30 computers per classroom. All the kids are accessing the same sites at the same time; it was the perfect proxy model. Now we're scrambling to redesign our networks. Of course we'll make it through, but we don't necessarily have the personnel or flexibility as other institutions.

Also, we can no longer filter Youtube videos or Google Images based on content. (Unless we want to MITM all of our devices.) This ends up meaning less access to the Internet for younger kids.

TLDR; Won't somebody think of the children :interrobang:

cayblood commented 9 years ago

@hayesr do you mean you're just banning all under 13 kids from using the Internet?

hayesr commented 9 years ago

@cayblood No, at least not yet. Google will let us force SafeSearch, which works if an image or url is indexed with "naughty words" but sometimes they are not. We are hoping that teachers are searching for content before they display it to the class; this is a bit of a ticking time bomb.

What happens is something gets through and then there is a brief panic. Principals call IT and expect us to do something, we attempt to flag content using Google's mechanisms and then wait.

YouTube is a mixed bag, we can force SafeSearch, but then the majority of videos are unavailable. Some of our schools try to involve students in creating original content. That's definitely something we want to encourage, however it seems all new videos are "unsafe" by default. So we've opened that up, again a potential time bomb.

DanielJoyce commented 9 years ago

Hayser: This won't work?

http://wiki.squid-cache.org/Features/HTTPS and https://www.howtoforge.com/filtering-https-traffic-with-squid ?

You'd need a proxy set up.

hayesr commented 9 years ago

@DanielJoyce We use Squid already for our non-ssl traffic, but to proxy the SSL traffic we'd have to be a Man-in-the-Middle and break end-to-end encryption. Plus, I think we'd have to install a certificate on every device that ever wanted to use our network. Besides my distaste for breaking SSL, the cert installation is impractical for us.

I'm seeing airlines break SSL and the resulting backlash, which leads me to believe that browsers will continue to find better ways of preventing these techniques.

In general we want to be good Internet citizens and stay current. In the long run that will be better for everyone. It just means we have to adapt a little faster than we can right now.

noncombatant commented 9 years ago

@hayesr: Breaking MITM to filter traffic for client devices that you legitimately own is supported behavior: https://www.chromium.org/Home/chromium-security/security-faq#TOC-How-does-key-pinning-interact-with-local-proxies-and-filters-

You can install your MITM proxy's issuing cert with a Group Policy Object or shell/batch script.

If you do not legitimately own the client device, then yeah it's not cool to try to MITM it. That is where the backlash came from on that airline issue.

hayesr commented 9 years ago

@noncombatant Gotcha. It's still somewhat impractical for us. We have a mix of Windows, Apple, Linux and soon Chromebooks. So we need a Windoze solution, and an iPad solution, and on and on. We'd also like teachers and other visitors to be able to bring their personal devices on campus without having to touch them. We have disk imaging and mobile device management tools for the broad stuff, but it's the onesie-twosie stuff that would kill us. Currently we're at about a 1500 device per tech. ratio.

lorenzogatti commented 9 years ago

@alex: I'm not attacking @jhourcle personally; I sincerely hope that he gets through HTTPS implementation successfully without being defeated by unreasonable impositions and lack of resources.

I respect his excessively conservative viewpoint because it is evidently the consequence of a rigid organization and of ample experience with bad users, high expectations and low budgets in the public sector rather than of laziness or incompetence. As a system administrator in a difficult place like NASA @jhourcle is more than entitled to discount the benefits and worry about the disruption and be afraid of not meeting demands; I agree with almost all of his technical analysis, but I see opportunity for improvement and educative suffering in the public interest where he sees risks and trouble.

The urgent need for improved Internet safety justifies the harshest sink-or-swim adoption of HTTPS; teaching end users by forcing them to adopt technical measures is more important than their convenience, and stressing organizations to find out which ones fail at IT is necessary. On the positive side, conflict between public library surfers and censorship proxies is good, shaking out SSL bugs is good, demanding more careful certificate handling is good, improving network connectivity is good, increasing public IT budgets is good.

hayesr commented 9 years ago

(to be clear) On the school IT side there is DNS filtering (eg OpenDNS), so we can still block all overt pornography, hate sites, etc. but it's a blacklist model.

alphapapa commented 9 years ago

The urgent need for improved Internet safety justifies the harshest sink-or-swim adoption of HTTPS; teaching end users by forcing them to adopt technical measures is more important than their convenience, and stressing organizations to find out which ones fail at IT is necessary. On the positive side, conflict between public library surfers and censorship proxies is good, shaking out SSL bugs is good, demanding more careful certificate handling is good, improving network connectivity is good, increasing public IT budgets is good.

It's easy to make all these proclamations about what forced changes are justified, who should be taught what, what organizations should be stressed, what conflict should be precipitated, what bugs should be shaken out, what procedures should be demanded, and what budgets should be increased--when you are not responsible for doing said things.

This whole matter is based upon an unfounded presupposition which has been declared by a technically illiterate policy wonk in Washington to be universally the case: that "there is no such thing as insensitive web traffic." Yet this is patently untrue. One of the most obvious, simple examples is that of government-funded astronomical data which is freely available to researchers. Yet this use-case, which is simple enough to be understood by anyone, is being dismissed with a rude, "Oh well, you'll manage. I know you don't have the manpower or budget or resources, but you'll manage. Don't be such a luddite progress obstructor."

Easy to say when it doesn't concern you. Hats off to @jhourcle for taking the time to document this and bang his head against the wa...I mean, try to reason with people. Good luck in your work.

RyanCopley commented 9 years ago

@alex You must get offended very easily.

nchammas commented 9 years ago

@RyanCopley Your comment is completely off topic and serves only to provoke. Keep the conversation on topic.

ghost commented 9 years ago

They are destroying the internet.

immibis commented 9 years ago

@shevegen That could mean two completely opposite opinions, depending on who you mean by "they"

HK12 commented 9 years ago

Hi hayesr, Please log the issue with forcesafesearch for YouTube blocking almost everything at the YouTube Help Forum discussion (link below). Perhaps if many of us complain, they will look at it seriously. https://productforums.google.com/forum/?utm_medium=email&utm_source=footer#!topic/youtube/32tHQqeTsIk;context-place=topicsearch/author$3Ame$20Youtube$20hiding

dxgldotorg commented 6 years ago

Hi. Sorry to bump this, but there are a few questions regarding this: First, is there any language in CIPA that requires schools and libraries to break or block HTTPS? Second, even if there was such a law, wouldn't it be trivial to simply whitelist .gov domains? I do not believe any Federal .gov domains use IP addresses shared with any service that could be inappropriate or illegal. State on the other hand, may use shared hosting due to low traffic and cost-cutting, and some states (and their websites) do facilitate activities that while permissible under state law are forbidden under federal law, such as the provision of controlled substances.

rschulman commented 6 years ago

@WilliamFeely A plain reading of CIPA does not say anything about breaking or blocking HTTPS. It only requires a "technology protection measure" on "any of its computers" that protects against the content listed.

To my knowledge, the question of whether a middlebox is the only way to create such protection has never been tested in a court, but from my own reading a client-side filter that runs on the school or library's computers that provides the same protections would also be compliant with the law. Where I think many schools and libraries get concerned is with people or students who bring their own devices onto their networks and whether they are responsible for filters for those devices as well.

I'm a lawyer, but not your lawyer, and please don't rely on this for legal advice!

dxgldotorg commented 6 years ago

When you go BYOD, can you expect students to install a MITM certificate in exchange for network access? As already mentioned, even with a MITM box (which by the way has caused troubles for the development of the TLS 1.3 standard) such a box should still have the capability to pass through approved sites like government servers and bank websites (to avoid breaches that could create big liabilities) without MITM'ing them.

GSA / https

Please consider the impacts of banning HTTP #107