LibreCat / Catmandu-Store-Elasticsearch

https://metacpan.org/release/Catmandu-Store-Elasticsearch
6 stars 5 forks source link

Support for Elasticsearch 7.x #30

Open mmcinnes-beyondtechnology opened 3 years ago

mmcinnes-beyondtechnology commented 3 years ago

Elasticsearch 7.x moves away from document types so the structure of requests to ES7 fail using 1.0202.

The following command results in a failure (below):

catmandu delete Elasticsearch --client '7_0::Direct' --index_name es-index-name --bag es-index-name --query 'field: "value"'
[Wed Feb 24 02:38:19 2021] # Request to: https://localhost:9200
curl -H "Content-type: application/json" -XPOST 'http://localhost:9200/es-index-name/es-index-name/_delete_by_query?pretty=true' -d '
{
   "query" : {
      "query_string" : {
         "query" : "field: \"value\""
      }
   }
}

The same URL structure is built for all calls which assumes the index and document type need to be passed to ES (which was deprecated in v7). See https://www.elastic.co/guide/en/elasticsearch/reference/current/removal-of-types.html

nics commented 3 years ago

Hi, we are aware of this issue and will release an update with es7+ support soon.

kosson commented 3 years ago

Running

catmandu import JSON to search --bag catalogcolectiv < dublincore.json` got me into a lot of problem siding with root mapping definition:
`Oops! [Request] ** [http://localhost:9200]-[400] [mapper_parsing_exception] Root mapping definition has unsupported parameters:  [catalogcolectiv : {}], called from sub Search::Elasticsearch::Role::Client::Direct::__ANON__ at /home/nicolaie/perl5/lib/perl5/Catmandu/Store/ElasticSearch/Bag.pm line 36. With vars: {'request' => {'method' => 'PUT','serialize' => 'std','path' => '/catalogcolectiv','ignore' => [],'body' => {'settings' => {},'mappings' => {'catalogcolectiv' => {}}},'qs' => {},'mime_type' => 'application/json'},'body' => {'status' => 400,'error' => {'caused_by' => {'reason' => 'Root mapping definition has unsupported parameters:  [catalogcolectiv : {}]','type' => 'mapper_parsing_exception'},'type' => 'mapper_parsing_exception','reason' => 'Failed to parse mapping [_doc]: Root mapping definition has unsupported parameters:  [catalogcolectiv : {}]','root_cause' => [{'type' => 'mapper_parsing_exception','reason' => 'Root mapping definition has unsupported parameters:  [catalogcolectiv : {}]'}]}},'status_code' => 400}

Is there any monkey patching?

yaml file is

store:
    search:
       package: ElasticSearch
       options:
         client: 7_0::Direct
         index_name: catalogcolectiv
         bags:
             mappings:
               properties:
                 title:
                   type: text
                 creator:
                   type: text
                 publisher:
                   type: text
                 description:
                   type: text
                 identifier:
                   type: keyword
             cql_mapping:
               indexes:
                   title:
                       op:
                           'any': true
                           'all': true
                           '=':   true
                           '<>':  true
                           'exact':
                               field: [ 'mytitle.exact' , 'myalttitle.exact' ]
                       field: mytitle
                       sort: true
                       cb: [ 'Biblio::Search' , 'normalize_title' ]

Data sample:

[
{
  "identifier": "094.1(498); 811.135.1'36-112",
  "_id": null,
  "date": "1805",
  "format": "110 p. ; 19 cm",
  "publisher": "Typis Regiae Universitatis Pestanae",
  "subject": ["carte românească veche", "lingvistică", "limba română", "gramatică", "gramatică istorică", "morfologie"],
  "creator": ["Şincai, Gheorghe"],
  "title": "Elementa Linguae Daco - Romanae sive Valachicae emendata, facilitata, et in meliorem ordinem redacta per Georgium Sinkay de Eadem, AA. LL. Philosophiae, & SS. Theologiae Doctorem, Scholarum Nationalium Valachicarum in Magno Transylvaniae Principatu primum, atque emeritum Directorem, nunc penes Regiam Universitatis Pestanae Typographiam Typi Correctorem. Budae, Typis Regiae Universitatis Pestanae. 1805. Georgium Sinkai [Carte tipărită]",
  "description": "Colecţii Speciale - BRV BRV101 687a BRV BRV101"
}, {
  "date": "1737",
  "format": "[8] p., 48 p. ; 23 cm",
  "_id": null,
  "identifier": "271.3(094)(498.4); 094.1(498.4)",
  "description": "Colecţii Speciale - Transilvanice T00156 448 T1 T00156",
  "title": "Ortus, Progressus, Vicissitudines, Excisio, et Restauratio, olim custodiae, nunc ab anno M. DCC. XXIX. Provinciae Transylvaniae Ord. Min. S. P. N. Francisci Strict Observ. Tituli S. Regis Stephani ex Gravissimis, Fideque dignis Authoritatibus clara, ac succinta methodo compilatus, primum in Urbe Orbis capite revisus, castigatus, approbatus, et excusus Typis Reverendae Camerae Apostolicae. eX Vrbe septICoLLe ReDVX oMIne FaVsto Hac secunda editione sub Gratiosissimus auspiciis [...] domini Joannis Haller ... [Györffi Pál] ; Editor: Joannis Haller [Carte tipărită]",
  "creator": ["Györffi Pal"],
  "subject": ["istoria Transilvaniei", "Transilvanice", "ordine călugăreşti", "franciscani", "biserica catolica", "ordine religioase"],
  "publisher": "Typis Ven. Conventus Csikiensis"
}
]
nics commented 3 years ago

Hi @kosson @jorol @mmcinnes-beyondtechnology I - finally - added es7 support. Could you help out testing the code in the es7 branch against a version 7 server?

You can load the unpublished code like so: catmandu -I /path/to/Catmandu-Store-ElasticSearch/lib import JSON to search --bag catalogcolectiv < dublincore.json

kosson commented 3 years ago

I will run a test, and come back with the outcome. I will add this into the flow as the third branch inhere https://mermaid-js.github.io/mermaid-live-editor/edit/#eyJjb2RlIjoiZ3JhcGggVEQ7XG4gICAgICAgIE1BUkMyMVtYTUwgw65uIEpTT04gOjogY2F0bWFuZHVdIC0tdHJhbnNmb3JtYXJlLS0-IEpTT05bU3RydWN0dXJhIHJlZmxlY3TEgyB0b3QgTUFSQzIxIDo6IGNhdG1hbmR1XTtcbiAgICAgICAgSlNPTiAtLXRyYW5zZm9ybWFyZS0tPiBKUVtSZW1vZGVsxINtIGRhdGVsZSBjdSBqcV07XG4gICAgICAgIEpTT04gLS3Drm5jxINyY2FyZSBkYXRlLS0-IE1vbmdvREJbKERhdGFiYXNlKV07XG4gICAgICAgIE1vbmdvREIgLS1jYXRtYW5kdS0tPiBKU09OMVtKU09OIGN1IGlkIHVuaWNdO1xuICAgICAgICBKUSAtLT4gT3BlblJlZmluZTtcbiAgICAgICAgSlNPTjEgLS0-IE9wZW5SZWZpbmU7XG4gICAgICAgIE9wZW5SZWZpbmUgLS3Drm1ib2fEg8ibaXJlLS0-IFJERjtcbiAgICAiLCJtZXJtYWlkIjoie1xuICBcInRoZW1lXCI6IFwiZGVmYXVsdFwiXG59IiwidXBkYXRlRWRpdG9yIjpmYWxzZSwiYXV0b1N5bmMiOnRydWUsInVwZGF0ZURpYWdyYW0iOnRydWV9

Than I will complete the exercise here https://github.com/kosson/sva21/blob/main/ghid/documentatie.md I put those links for you to understand its a part of a more complex exercise, and also if other ideas of good examples of ETLs based on catmandu are out there, to highlight 'em for me, please?!

kosson commented 3 years ago

No avail here. Command: catmandu -I /perl5/lib/perl5/Catmandu/Store/ElasticSearch/lib import JSON to search --bag catalogcolectiv < dcreformated.json

Output:

[DEPRECATION] Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.14/security-minimal-setup.html to enable security. - In request: {method => "HEAD",path => "/",timeout => 2} [DEPRECATION] Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.14/security-minimal-setup.html to enable security. - In request: {body => undef,ignore => [],method => "HEAD",path => "/catalogcolectiv",qs => {},serialize => "std"} [DEPRECATION] Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.14/security-minimal-setup.html to enable security. - In request: {body => {mappings => {catalogcolectiv => {}},settings => {}},ignore => [],method => "PUT",mime_type => "application/json",path => "/catalogcolectiv",qs => {},serialize => "std"} Oops! [Request] ** [http://localhost:9200]-[400] [mapper_parsing_exception] Root mapping definition has unsupported parameters: [catalogcolectiv : {}], called from sub Search::Elasticsearch::Role::Client::Direct::__ANON__ at /home/nicolaie/perl5/lib/perl5/Catmandu/Store/ElasticSearch/Bag.pm line 36. With vars: {'request' => {'path' => '/catalogcolectiv','serialize' => 'std','mime_type' => 'application/json','body' => {'mappings' => {'catalogcolectiv' => {}},'settings' => {}},'method' => 'PUT','qs' => {},'ignore' => []},'body' => {'error' => {'type' => 'mapper_parsing_exception','caused_by' => {'type' => 'mapper_parsing_exception','reason' => 'Root mapping definition has unsupported parameters: [catalogcolectiv : {}]'},'reason' => 'Failed to parse mapping [_doc]: Root mapping definition has unsupported parameters: [catalogcolectiv : {}]','root_cause' => [{'type' => 'mapper_parsing_exception','reason' => 'Root mapping definition has unsupported parameters: [catalogcolectiv : {}]'}]},'status' => 400},'status_code' => 400}

jorol commented 3 years ago

I've installed Elasticsearch 7.10.2 and your branch 'es7' on a test machine. I've got the following error while testing:

$ catmandu import JSON to search --bag catalogcolectiv < dublincore.json
Oops! [Request] ** [http://localhost:9200]-[400] [illegal_argument_exception] The mapping definition cannot be nested under a type [_doc] unless include_type_name is set to true., called from sub Search::Elasticsearch::Role::Client::Direct::__ANON__ at /home/kim/perl5/perlbrew/perls/perl-5.30.3/lib/site_perl/5.30.3/Catmandu/Store/ElasticSearch/Bag.pm line 36. With vars: {'body' => {'status' => 400,'error' => {'type' => 'illegal_argument_exception','root_cause' => [{'type' => 'illegal_argument_exception','reason' => 'The mapping definition cannot be nested under a type [_doc] unless include_type_name is set to true.'}],'reason' => 'The mapping definition cannot be nested under a type [_doc] unless include_type_name is set to true.'}},'request' => {'mime_type' => 'application/json','path' => '/catalogcolectiv','serialize' => 'std','ignore' => [],'body' => {'mappings' => {'_doc' => {}},'settings' => {}},'method' => 'PUT','qs' => {}},'status_code' => 400}
$ catmandu import JSON --line_delimited 1 to ElasticSearch --client '7_0::Direct' --bag kbart < kbart.jsonl 
Oops! [Request] ** [http://localhost:9200]-[400] [illegal_argument_exception] The mapping definition cannot be nested under a type [_doc] unless include_type_name is set to true., called from sub Search::Elasticsearch::Role::Client::Direct::__ANON__ at /home/kim/perl5/perlbrew/perls/perl-5.30.3/lib/site_perl/5.30.3/Catmandu/Store/ElasticSearch/Bag.pm line 36. With vars: {'request' => {'ignore' => [],'mime_type' => 'application/json','qs' => {},'body' => {'mappings' => {'_doc' => {}},'settings' => {}},'path' => '/kbart','method' => 'PUT','serialize' => 'std'},'status_code' => 400,'body' => {'error' => {'root_cause' => [{'reason' => 'The mapping definition cannot be nested under a type [_doc] unless include_type_name is set to true.','type' => 'illegal_argument_exception'}],'reason' => 'The mapping definition cannot be nested under a type [_doc] unless include_type_name is set to true.','type' => 'illegal_argument_exception'},'status' => 400}}