Center-for-Research-Libraries / vufind

CRL Implimentation of VuFind frontend for FOLIO. A library resource discovery portal designed and developed for libraries by libraries
GNU General Public License v2.0
1 stars 0 forks source link

Wikipedia links do not match LC headings #178

Open AndyElliottCRL opened 1 year ago

AndyElliottCRL commented 1 year ago

Complaint from a patron (appears to not be a CRL member): In CRL VuFind catalog records for an author link to a Wikipedia page that is not for the same person.

Background:

Recommendation:

VuFind_issue_WIkipedia

AndyElliottCRL commented 1 year ago

@awood is not enthusiastic about just turning it off, so we get to dig in.

./module/VuFind/src/VuFind/Connection/Wikipedia.php

Exists: public function getSourceIdentifier() for various objects. Would it be possible to use these to get an author's LC Authority 010/WorldCat ARN, and pass that to Wiki? We pass urlencode($author) to Wiki.

public function get($author) [...] // Get information from Wikipedia API $uri = 'http://' . $this->lang . '.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=php&list=allpages&titles=' . urlencode($author);


@nflorin notes that 2 million WIki articles have "Authority Control" box that includes VIAF and LC identifiers. It would be possible to scrape a Wiki download file and match Folio identifiers found in those Wiki articles, and create a link table, LC/Wiki, then use that in web code to create our own links. Kind of a project, we reserve the idea if VuFind native configuration is not suitable.

AndyElliottCRL commented 1 year ago

Others have struggled with the same problem, since 2012, such as Open Library Foundation 629, "Improvement to results retrieved by Wikipedia". Found OLF 169 from 2012, points to

Amusingly enough, it looks the strategy here was to ... get the Wikipedia link from a VIAF lookup, as @nflorin had suggested. OLF 629 describes using VIAF. OLF 622, "Improvements to Authority Module (and Linked Data?)": [...]

Wikipedia: Even if users don't want to use the enhanced Authority Record view, they can take advantage of improvements to the Wikipedia code. The getAuthorInfo method now accepts a direct link as a parameter. This link is populated in the module by searching the Authority Index and retrieving the raw lccn number for the first matching record. This lccn number is then passed to VIAF which may or may not return a direct link to the Wikipedia article. Users can configure whether they will only accept a direct link to the Wikipedia article or whether they will accept the best guess result in case there is no direct link from VIAF.

Have not found where we trigger it to do the right thing.

AndyElliottCRL commented 1 year ago

VF mailing list thread comments. Author has no time to fix this. User is pessimistic in 2015; some background:

issue178vfmaillist image

AndyElliottCRL commented 1 year ago

Good digging point in a Sourceforge VuFind mailing list. Includes main Author in 2020:

You can index authority data to improve the accuracy of Wikipedia matching; without it, common names will sometimes be mismatched, since the system does not have access to more specific identifiers to clarify links – it simply matches on names. See the documentation here. To activate this feature, you need to download and index the VIAF authority records, plus you need to change the configuration line 453 here: to: top[] = AuthorInfo:true Alternatively, if you do not find the author information helpful, you can simply comment this line out to eliminate it.

Line 453 is different in our code base now. From quoted link back then, the line was vfissue178_line453recc but now it's line 433 for us. This setting is commented out (inoperative) with the semicolon.

BUT WAIT that's seems totally unrelated to top[] = AuthorInfo:true. I think this is a mistake in the original pointer and we are looking at line 484 in the new code. That's currently: top[] = AuthorInfo RESUME HERE


Still to investigate:

AndyElliottCRL commented 1 year ago

With @awood 2023-03-29, what to do here.

No extensive work since VF may not last us for the long term.