HKS3 / HKS3GeoSearch

koha plugin to display a map of the geographical location of a book
3 stars 0 forks source link

Figure out how to include authority data with bibliographic data during ES index #4

Open domm opened 3 months ago

domm commented 3 months ago

In Koha Chat we were told that it should be possible to merge authority data with bibliographic data into the elasticsearch document during indexing.

tadzik commented 2 months ago

Koha currently only does this for the "see also" fields. It's hardcoded, and hidden behind IncludeSeeFromInSearches and IncludeSeeAlsoFromInSearches preferences. The code handling these (Koha::Filter::MARC::EmbedSeeFromHeadings) is specifically tailored for these two specifically.

According to cait:

it's off by default

it probably makes the index bigger and maybe the search results more confusing, because you don't see why you found a record. I think the geographic data is much smaller athan all the different name forms of "Goethe" you would not wan tto show in a record. different use cases

all that to say... if you develop some magic i'd make it separate

Which to me sounds like "build a new thing instead of extending the existing one, and it'll probably be safe to be on-by-default". I'd still keep it in mind to make it at least a little bit generic, since I imagine it being useful for other things as well. But, for our case, a simple logic such would probably suffice: if a record contains a 651 (SUBJECT ADDED ENTRY--GEOGRAPHIC NAME), lookup the authority it references, and treat its geographic fields as if we found them in the biblio record itself.

I'll build a prototype of this and see how it works out.

domm commented 2 months ago

Koha currently only does this for the "see also" fields. It's hardcoded, and hidden behind IncludeSeeFromInSearches and IncludeSeeAlsoFromInSearches preferences. The code handling these (Koha::Filter::MARC::EmbedSeeFromHeadings) is specifically tailored for these two specifically.

That's what I've assumend

Which to me sounds like "build a new thing instead of extending the existing one, and it'll probably be safe to be on-by-default". I'd still keep it in mind to make it at least a little bit generic, since I imagine it being useful for other things as well. But, for our case, a simple logic such would probably suffice: if a record contains a 651 (SUBJECT ADDED ENTRY--GEOGRAPHIC NAME), lookup the authority it references, and treat its geographic fields as if we found them in the biblio record itself.

Generally, I agree.

In the current case (Geologische Bundesanstalt) we'll have to convert their data into MARC by ourselves and so can make sure that the geographic authorities will in fact be stored in 651 (FYI, these can be geonames (i.e. a ciry/region, ..) or numbers pointing to a "Kartenblatt" (map number) like https://www.bev.gv.at/Services/Produkte/Landkarten/OEK25V-UTM.html)

I'll build a prototype of this and see how it works out.

For a prototype, a hardcoded mapping is enough. But I think it would make sense to already plan the prototype in a way that we can later replace the hardcoded mapping with mapping(s) coming from a config file. Though defining these mappings for eg a polygon will be interesting :-)

Anyway, we could then also use this to eg append some data to a bibliographic record, so maybe the mappings will also need to define a "method"

so something like:

{
 "651": [
    { "auth": "032s", "elastic": "lat", "method": "set_geopoint" },
    { "auth": "032t", "elastic": "lon",  "method": "set_geopoint" },
    { "auth": "032defg", "elastic": "coordinates",  "method": "set_georectangle" },
  ],
 "100": [
    { "auth": "123x", "elastic": "author.name", "method": "append" },
 ]
}

But again, probably this is too early now..

tadzik commented 2 months ago

Quick and dirty, but this seems to work:

diff --git a/Koha/SearchEngine/Elasticsearch.pm b/Koha/SearchEngine/Elasticsearch.pm
index af9bdd97a1..08d100cf45 100644
--- a/Koha/SearchEngine/Elasticsearch.pm
+++ b/Koha/SearchEngine/Elasticsearch.pm
@@ -621,6 +621,10 @@ sub marc_records_to_documents {
                         $altscript = 1;
                     }
                 }
+                # Handle references to GEOGR_NAME authorities
+                if ($marcflavour eq 'marc21' && $tag eq '651') {
+                    $self->embed_geographic_name($field, $record_document, $data_fields_rules);
+                }

                 my $data_field_rules = $data_fields_rules->{$tag};
                 if ($data_field_rules) {
@@ -853,6 +857,40 @@ sub marc_records_to_documents {
     return \@record_documents;
 }

+sub embed_geographic_name {
+    my ($self, $field, $record_document, $rules) = @_;
+
+    my $authid = $field->subfield('9');
+    return unless $authid;
+    my $authority = Koha::MetadataRecord::Authority->get_from_authid($authid);
+    return unless $authority;
+
+    my $tag = '034';
+
+    my $auth_marc = $authority->record;
+    my @coordinate_fields = $auth_marc->field($tag);
+
+    for my $field (@coordinate_fields) {
+        my $data_field_rules = $rules->{$tag};
+        if ($data_field_rules) {
+            my $subfields_mappings = $data_field_rules->{subfields};
+            my $wildcard_mappings = $subfields_mappings->{'*'};
+            foreach my $subfield ($field->subfields()) {
+                my ($code, $data) = @{$subfield};
+                my $mappings = $subfields_mappings->{$code} // [];
+                if (@{$mappings}) {
+                    $self->_process_mappings($mappings, $data, $record_document, {
+                            data_source => 'subfield',
+                            code => $code,
+                            field => $field
+                        }
+                    );
+                }
+            }
+        }
+    }
+}
+
 =head2 _marc_to_array($record)

     my @fields = _marc_to_array($record)
tadzik commented 2 months ago

Submitted as a bug+patch to Koha bugzilla now for further discussion: https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=37821