colav-playground / advanced_user_tests

0 stars 0 forks source link

Obtain google scholar id from scholar source #9

Open restrepo opened 1 year ago

restrepo commented 1 year ago

Whenever comparing a work with the corresponding record in Google Scholar, extract the google_id for all the authors in the work.

Obtain the full names in the Google Scholar record from the more comprehensive bibtex info

Assign a quality of the type of normalized match accordingly to the following hierarchy:

muzgash commented 10 months ago

The scholar ids and the bibtex (and authors) are in different variables within the resulting database from moai's process (shown below):

{
    _id: ObjectId("629115506f6a44dc7d69ac40"),
    author: 'Westermann, Olaf and Förch, Wiebke and Thornton, Philip and Körner, Jana and Cramer, Laura and Campbell, Bruce',
    profiles: { 'W Förch': 'HwyJZC0AAAAJ', 'P Thornton': 'Wx_me7EAAAAJ' },
    bibtex: '@article{westermann2018scaling,\n' +
      '  title={Scaling up agricultural interventions: Case studies of climate-smart agriculture},\n' +
      '  author={Westermann, Olaf and F{\\"o}rch, Wiebke and Thornton, Philip and K{\\"o}rner, Jana and Cramer, Laura and Campbell, Bruce},\n' +
      '  journal={Agricultural Systems},\n' +
      '  volume={165},\n' +
      '  pages={283--293},\n' +
      '  year={2018},\n' +
      '  publisher={Elsevier}\n' +
      '}\n'
  },
  {
    _id: ObjectId("629115506f6a44dc7d69a764"),
    author: 'Restrepo, Héctor F and Rondón, Martı́n and Rojas, Mar\\á X and Torres, Yolanda and Aschner, Pablo and Dennis, Rodolfo J',
    profiles: {
      'HF Restrepo': 'k1YkH44AAAAJ',
      'M Rondón': 'cDXnenAAAAAJ',
      'MX Rojas': 'hunwMEsAAAAJ'
    },
    bibtex: '@article{restrepo2010comparacion,\n' +
      "  title={Comparaci{\\'o}n de la funci{\\'o}n pulmonar de pacientes con diabetes mellitus tipo 2 sometidos a tratamiento de insulina inyectada versus tratamiento con hipoglucemiantes orales},\n" +
      "  author={Restrepo, H{\\'e}ctor F and Rond{\\'o}n, Mart{\\'\\i}n and Rojas, Mar{\\'\\i}a X and Torres, Yolanda and Aschner, Pablo and Dennis, Rodolfo J},\n" +
      "  journal={ActA M{\\'e}dicA coloMbiAnA},\n" +
      '  volume={35},\n' +
      '  number={3},\n' +
      '  pages={113--118},\n' +
      '  year={2010},\n' +
      "  publisher={Acta M{\\'e}dica Colombiana}\n" +
      '}\n'
  }

In the profiles variable, names/keys are always shortened by initials and lastnames.

Right now I'm using thefuzz to relate authors field to profiles keys (check https://github.com/colav/Kahi_plugins/blob/main/Kahi_works/kahi_works/Kahi_works.py#L607C1-L641C40).

To implement this in the new code for the independent kahi_scholar_works plugin I would need a more precise definition of the mechanism to relate and rate the quality of the id assigned since we cannot use the first names.

restrepo commented 10 months ago

Posible metodología:

  1. Convertir LaTeX to unicode

author={Restrepo, H{\\'e}ctor F and Rond{\\'o}n, Mart{\\'\\i}n and Rojas, Mar{\\'\\i}a X and Torres, Yolanda and Aschner, Pablo and Dennis, Rodolfo J},

a

author={Restrepo, Héctor F and Rondón, Martín and Rojas, María X and Torres, Yolanda and Aschner, Pablo and Dennis, Rodolfo J},

  1. Convierta al formato de iniciales de profiles:

keys = {"HF Restrepo": "Restrepo, Héctor F", "M Rondón": "Rondón, Martín", "MX Rojas": "Rojas, María X", "Y Torres":"Torres, Yolanda", "P Aschner":"Aschner, Pablo", "D Rodolfo J":"Dennis, Rodolfo J"]

use como claves del diccionario profiles:

new_profiles = dict([(keys(k), profiles.get(k) for k in keys if profiles.get(k) ])