clarin-eric / VLO

Virtual Language Observatory
GNU General Public License v3.0
14 stars 6 forks source link

Upgrade to Solr 6 #87

Closed twagoo closed 7 years ago

twagoo commented 7 years ago

The 4.x branch of Solr has not seen any release since early 2015. The current minor version is 6.6 and 5.x at least still seems maintained. So if we can, we should upgrade to version 6 or at least 5.

This might require migrating the definitions and configurations, and should be tested thoroughly.

Of the new features, the GraphQueryParser available as of Solr 6 could be interesting in relation to handling metadata hierarchies.

twagoo commented 7 years ago

Note: Solr 5 is no longer distributed as a war, but has to run as a self contained service. This will probably require some restructuring of the way the VLO project is organised, ran and deployed (see page 532 of this guide)

twagoo commented 7 years ago

I started experimenting with a dockerised Solr instance with pre-loaded data to make the development workflow a bit more streamlined. I will use the latest Solr docker image as a basis, which is Solr 6, trying to keep as much of the existing configuration unchanged as possible. The result (if any!) can probably be used as input to resolving this issue. I will provide updates as comments to this issue.

teckart commented 7 years ago

We should synchronize our work then. I have a running Solr 6 configuration and a working importer here (except a minor problem with the suggester configuration). Currently I am trying to get the vlo-web-app running, where the record pages are throwing some NPEs, probably because of some Solr configuration problem.

twagoo commented 7 years ago

Oh great! I have not gotten to the stage of adapting/migrating the configuration or scheme. Actually I have not even gotten to the stage where I have automated the configuration of the Solr instance and the creation of a core. So we have not duplicated work yet, I think :)

Can you push/share the current state of your work?

twagoo commented 7 years ago

By the way, the error that you are getting may be due to a missing "mlt" request handler in case you have not merged the changes in the solr configuration for #61 yet. See 483dce86b9e0e649ef43f22d11411cd7eaf8e11e and a couple of downstream changes.

teckart commented 7 years ago

Sure - I will push my issue branch tomorrow (don't have access to it right now). It currently only contains the importer/web-app changes. I will have to think about the Solr configuration (because of the new deployment procedures) - for the time being I may just zip and send it to you via mail.

Regarding "mlt": yes that was one of the problems: "Problem accessing /solr/collection1/mlt. Reason:" But there was also a second one, complaining about an undefined field. I will have a look tomorrow...

twagoo commented 7 years ago

@teckart if you could send me the Solr configuration that works (more or less) with Solr 6 that would be great!

teckart commented 7 years ago

Branch issue87 is mostly functional (importer+web app) and based on Solr 6.6. Known issues are:

A Solr configuration (used in standalone Solr 6.6 server) can be found here.

twagoo commented 7 years ago

Thanks! I tried to 'inject' this into the docker image for Solr 6.6 and the (or at least some) issue with the suggesters appears to be fatal:

2017-09-14 13:13:23.403 ERROR (qtp1205044462-18) [   ] o.a.s.s.HttpSolrCall null:org.apache.solr.core.SolrCoreInitializationException: SolrCore 'collection1' is not available due to init failure: Error in configuration: textSuggest is not defined in the schema

The server starts but when I try to query, I get a 500 response with a message "not available due to init failure". Are you not experiencing this?

teckart commented 7 years ago

I didn't have this problem and textSuggest should be correctly defined in the schema. Maybe it helps to delete conf/managed-schema and rename schema.xml.bak to schema.xml. Solr should then transform schema.xml to a new managed schema at startup.

twagoo commented 7 years ago

Maybe it helps to delete conf/managed-schema and rename schema.xml.bak to schema.xml.

This seems to have done the trick! I managed to run an import and connect the web app to it.

Perhaps restructuring the configuration directory structure to match the one generated by Solr when creating a new core also helped. The resulting structure is

collection1
├── conf
│   ├── currency.xml
│   ├── email_url_types.txt
│   ├── lang
│   │   └── (...)
│   ├── managed-schema.bak
│   ├── params.json
│   ├── protwords.txt
│   ├── schema.xml
│   ├── schema.xml.bak
│   ├── solrconfig.xml
│   ├── stopwords.txt
│   ├── synonyms.txt
│   ├── update-script.js
│   └── velocity
│       └── (...)
└── core.properties

I would also like to propose to rename the core to something more descriptive like vlo or vlo-index.

twagoo commented 7 years ago

Update: using the provided directory structure also works

twagoo commented 7 years ago

This just in 😬

20 September 2017, Apache Solr™ 7.0.0 available The Lucene PMC is pleased to announce the release of Apache Solr 7.0.0

https://lucene.apache.org/solr/news.html

teckart commented 7 years ago

Wow, perfect timing - I will have a look what changes are necessary.

twagoo commented 7 years ago

Just a quick confirmation that after a quick test both the sitemap generator and statistics tool appear to work fine with Solr 6. Unless @teckart has any indication that there are issues with these :)

teckart commented 7 years ago

Branch issue87-solr7 contains a version update to Solr 7.0.0 with corresponding code changes (but some test errors in vlo-web-app).

twagoo commented 7 years ago

As of 180549219c8f69e35435972abda3aaeeaebb3b0f VLO working with Solr 7 is merged into development branch.