apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.63k stars 1.02k forks source link

Make page ready for pure HTTPS [LUCENE-9227] #10267

Closed asfimport closed 4 years ago

asfimport commented 4 years ago

The web page can currently be visited using HTTPS but this brings warning:

Once those 2 problems are fixed (I grepped for HTTP and still found many links with HTTP, but looks like no images or scripts or css anymore), I'd like to add a permanent redirect http://lucene.apache.org/ -> https://lucene.apache.org to the htaccess template file.


Migrated from LUCENE-9227 by Uwe Schindler (@uschindler), resolved Feb 28 2020 Parent: #10030 Linked issues:

asfimport commented 4 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

At least we may need to disable LucidWorks Searchhub until the HTTPS issue is solved, otherwise we violate GDPR.

asfimport commented 4 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

I fixed at least Sematext search to use HTTPS. Anybody from Lucidworks who knows what's the status on HTTPS?

asfimport commented 4 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

@ctargett: Do you know what the status is about LucidFind /SerachHub and HTTPS? We need to change the Lucene homepage's input to never ever send any user-inputted text without HTTPS (to comply with GDPR). The LucidFind / SearchHub search engine referenced on the Lucene/Solr homepage does not work with HTTPS (timeout connecting on port 443).

asfimport commented 4 years ago

Jan Høydahl (@janhoy) (migrated from JIRA)

Many literal http links were moved to https in https://github.com/apache/lucene-site/pull/12/commits/9f10a556b2e424aefc66426076dcb20ab2344e25

asfimport commented 4 years ago

Cassandra Targett (@ctargett) (migrated from JIRA)

Sorry, I'm traveling this week and got behind on mails. I've asked inside Lucidworks who manages that service, and hopefully I'll get an answer soon. Can we give it a day? If you want to disable it sooner, that's fine.

asfimport commented 4 years ago

Cassandra Targett (@ctargett) (migrated from JIRA)

Got an answer faster than I thought I would - we aren't going to be able to implement HTTPS there for a little while at least, so if either of you want to go ahead and remove the LucidFind option, that's probably the best bet in order to stay in compliance with GDPR.

asfimport commented 4 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Hi,

thanks for the quick answer. I was hoping that you can just install LetsEncrypt on the server to make it encrypted. To me it looks like the service is no longer fully maintained, so maybe you just have no time to take care of it.

The problem is as we want to switch the Lucene webpage to fully be HTTPS (like most websites nowadays), anybody who selects the LucidFind option would get a security warning in Firefox Browsers and a warning in Chrome after a query is entered. Currently the search engine is randomly preselected. I'd remove the randomization and just keep the OSC one the default. People who select the LucidFind one would get a security warning, but they have to explicitly select it.

Regarding GDPR: The discussion about sending "user entered" form data using unencrypted form submission is still is a hot issue in the EU. Contact forms have to be encrypted (if you don't do that you quickly get sued by competitors); but with other forms like search forms, lawyers are still discussing. But on the other hand a website that uses HTTPS with HSTS headers (to teach browser to switch to HTTPS forever) then sending user data unencrypted over the wire is a bit strange.

So I would for now set the default without randomness to OSC and yours is only used if explicitely selected.

After that I would proceed with enabling HTTPS Perm Redirect and starting with a HSTS header of short lifetime (1 hours). If all wents fine, I will raise the HSTS header to 30 days. If still we get no complaints, I will change it to one year (the default recommended). After that every browser will keep the information that lucene.apache.org has to be accessed encrypted (this is important to prevent anybody to intercept the connection while it was not upgraded to HTTPS yet). So everything except very first access is then secured. Users coming back later will always use HTTPS automatically without explicitely entering the full URL with HTTPS.

Any comments? If anybody has another Lucene-Centric search (e.g. @mikemccand Lucene Search) speak up, we can include it into the search box.

I'd proceed with this on the weekend, implementing HTTPS with increasing HSTS header lifetimes as described before.

asfimport commented 4 years ago

Cassandra Targett (@ctargett) (migrated from JIRA)

I was hoping that you can just install LetsEncrypt on the server to make it encrypted. To me it looks like the service is no longer fully maintained, so maybe you just have no time to take care of it.

That's correct, really. Responsibility for it has shifted several times over the years and is currently not assigned to anyone who has time to work on it. The index is quite out-of-date it seems also, so results would be misleading. There is some hope this year there will be some resources to improve things, but that's not a guarantee.

If it is easier to simply remove it, that's really fine with us. I'll leave it up to you.

asfimport commented 4 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

I committed the following to htaccess.template:

<If "%{HTTPS} == 'on'">
  Header always set Strict-Transport-Security "max-age=300"
</If>
<Else>
  RewriteRule ^ https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L]
</Else>

This is IMHO the most consistent way to express this. There are shorter ways, but the if/else statements are easier to read:

Uwe Schindler@VEGA:\~ > curl -I https://lucene.staged.apache.org/
HTTP/1.1 200 OK
Date: Mon, 24 Feb 2020 19:40:37 GMT
Server: Apache
Strict-Transport-Security: max-age=300
Last-Modified: Fri, 21 Feb 2020 12:58:09 GMT
ETag: "394a-59f1592c57599"
Accept-Ranges: bytes
Content-Length: 14666
Vary: Accept-Encoding
Content-Type: text/html

Uwe Schindler@VEGA:\~ > curl -I http://lucene.staged.apache.org/test?hallo
HTTP/1.1 301 Moved Permanently
Date: Mon, 24 Feb 2020 19:44:03 GMT
Server: Apache
Location: https://lucene.staged.apache.org/test?hallo
Content-Type: text/html; charset=iso-8859-1

I plan to merge this to master quite soon, so please test it! I will keep the STS header with 300seconds for a while and then raise to one year, if no complaints are coming.

asfimport commented 4 years ago

Jan Høydahl (@janhoy) (migrated from JIRA)

Tested with browser and curl. The redirect works, but I know nothing about STS :)

asfimport commented 4 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Tested with browser and curl. The redirect works, but I know nothing about STS

Thanks. STS is Strict Transport Security (https://en.wikipedia.org/wiki/HTTP_Strict_Transport_Security). It send a special HTTP header that instruts the browser to always use HTTPS for a domain. This lowers the risk that somebody intercepts the initial connection to the webserver with HTTP (users normally only enter the domain name making the browser use HTTP and get redirected to HTTPS). As the redirect is not secured, a bad guy could remove the redirect and serve (a modified) page. With HSTS the browser will (except for the very first access) use HTTPS forever, also when links use HTTP or user enters domain name without protocol. Basically, when you once sent this header you can no loger switch off HTTPS until the lifetime of this header. The recommendation is to send one year or more, but I initially added 300seconds for testing.

It's now deployed also in production. I will raise to one year next weekend.

asfimport commented 4 years ago

Cassandra Targett (@ctargett) (migrated from JIRA)

It turns out that the team at Lucidworks were actually hoping you would want to remove the LucidFind option because they want to turn it off. The resources I mentioned were for something else (my mistake) and they don't anticipate anyone being able to get around to a long list of updates needed for that service any time this year.

They've now asked me to make sure it is removed so they can just go ahead and shut down the service.

Not really being up on how this works, it looks to me that it wouldn't be too hard to edit the search templates to remove it, but being creaky with JS, I don't know if I could do it myself. Uwe or Jan, if neither of you have time/desire to do it, just let me know and I'll file a sub-task and ask someone here at Lucidworks to do it asap.

asfimport commented 4 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

OK, I will remove it. This involves some work, as I have to remove the dropdown box to select the search provider! Actually the Javascript can go way completely then. There is also another strange search.js that I have no idea what it does (other than producing a JS error). I will figure that out later today.

If anybody knows another search engine for the "Lucene/Solr" content, speak up! I can inject the URLs quite fast instead of removing the dropdown list completely. Is @mikemccand's Lucene Search engine based on pure Lucene still available and maintained (with HTTPS)?

asfimport commented 4 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

I removed LucidFind and adapted the styling after removing the dropdown. All Javascript was removed, font sizes adjusted.

asfimport commented 4 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Is @mikemccand's Lucene Search engine based on pure Lucene still available and maintained (with HTTPS)?

The only Lucene based search engine I'm currently personally running in production is jirasearch.mikemccandless.com, for searching all Lucene/Solr/Tika/Infra Jira issues ... it tries to showcase/dog food a number of fun pure Lucene features ... blog post here. But, alas, it's still HTTP only :(

asfimport commented 4 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Thanks @mikemccand. I removed the dropdown on the Lucene website already, so we have now only one search engine (OSC) left over. I am fine with that, also looks better than before.

asfimport commented 4 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

I raised the HSTS max-age and the caching of JS/CSS assets to 30days.

If all wents fine we can finally go with both to 1 year. I will do this at the end of march (I will add an appointment in my calendar).

asfimport commented 4 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

I set the issue status to fixed, as all tasks are finally done. Raising max-age of HSTS is a separate task: #10296