gap-system / GapWWW

Source for the GAP website
https://www.gap-system.org
12 stars 25 forks source link

Transfer our bibliography data to zbmath, then replace our bibliography with a link to swmath #343

Open fingolfin opened 3 months ago

fingolfin commented 3 months ago

The link: https://zbmath.org/software/320

This list provides basically everything we have at https://www.gap-system.org/Doc/Bib/bib.html and even has additional nice features. And unlike MathSciNet it is free to use

While it has overall more publications than we do, it does miss some -- potentially in some cases papers might not be indexed by them at all, but so far all cases I found were a paper is in our list but in theirs is a matter of missing metadata on their part, i.e., the "tag" "sw:gap" is missing on some papers for whatever reasons.

I have contacted them and in principle I can send them lists of papers that are missing this tag and they'll add it (presumably after some validation, of course).

That leaves the problem as to how we get that list. Of course we can manually check things but there are thousands. So better to automate it. Here is how one could do that:

  1. get our data -- easy, just download https://www.gap-system.org/Doc/Bib/gap-publishednicer.bib
  2. get their data
    • I wrote a script to do so with their help and some manual tweaking, and have that .bib file (it is 1.8 MB so I am not attaching it but instead I'll add the crude script below)
  3. write a tool which parses the bib files (e.g. in Python and using https://bibtexparser.readthedocs.io/en/main/), then lists papers we have but they don't
    • this is easy for papers with a DOI and if both sides have the DOI, so let's drop those first
    • next compare using title, year, author(s)?
    • keep refining but at some point it will be more efficient to just let humans consider the lists...
  4. for the remaining papers, try to get their zbmath ID ... this could use their website, but it seems they have an API for that, with some Python bindings here: https://github.com/zbMATHOpen/zbRestApiClient
    • actually it may make sense to combine 3 with 4: if we can identify one of "our" papers using the zbmath API then it is easy to determine if it is in their list of "papers using GAP" or not...
  5. the final result would be two lists of papers
    • one with papers we have but they don't and which we successfully identified (we probably just need the list of ids here and then can send it to them
    • papers they don't seem to have in the database at all
      • this will certainly include many theses!
      • how we deal with this we'll have to decide once we have that list..

Script for getting zbmath data

#!/bin/sh
echo > zbmath.bib
curl "https://zbmath.org/bibtexoutput/?q=si%3A320&start=0&count=200" >> zbmath.bib
curl "https://zbmath.org/bibtexoutput/?q=si%3A320&start=200&count=200" >> zbmath.bib
curl "https://zbmath.org/bibtexoutput/?q=si%3A320&start=400&count=200" >> zbmath.bib
curl "https://zbmath.org/bibtexoutput/?q=si%3A320&start=600&count=200" >> zbmath.bib
curl "https://zbmath.org/bibtexoutput/?q=si%3A320&start=800&count=200" >> zbmath.bib
curl "https://zbmath.org/bibtexoutput/?q=si%3A320&start=1000&count=200" >> zbmath.bib
curl "https://zbmath.org/bibtexoutput/?q=si%3A320&start=1200&count=200" >> zbmath.bib
curl "https://zbmath.org/bibtexoutput/?q=si%3A320&start=1400&count=200" >> zbmath.bib
curl "https://zbmath.org/bibtexoutput/?q=si%3A320&start=1600&count=200" >> zbmath.bib
curl "https://zbmath.org/bibtexoutput/?q=si%3A320&start=1800&count=200" >> zbmath.bib
curl "https://zbmath.org/bibtexoutput/?q=si%3A320&start=2000&count=200" >> zbmath.bib
curl "https://zbmath.org/bibtexoutput/?q=si%3A320&start=2200&count=200" >> zbmath.bib
curl "https://zbmath.org/bibtexoutput/?q=si%3A320&start=2400&count=200" >> zbmath.bib
curl "https://zbmath.org/bibtexoutput/?q=si%3A320&start=2600&count=200" >> zbmath.bib
curl "https://zbmath.org/bibtexoutput/?q=si%3A320&start=2800&count=200" >> zbmath.bib
curl "https://zbmath.org/bibtexoutput/?q=si%3A320&start=3000&count=200" >> zbmath.bib
curl "https://zbmath.org/bibtexoutput/?q=si%3A320&start=3200&count=200" >> zbmath.bib
curl "https://zbmath.org/bibtexoutput/?q=si%3A320&start=3400&count=200" >> zbmath.bib
curl "https://zbmath.org/bibtexoutput/?q=si%3A320&start=3600&count=200" >> zbmath.bib