Inist-CNRS / web-services

Web services at Inist-CNRS
https://services.istex.fr
5 stars 0 forks source link

[address-kit/affiliationcountry] Erreurs dans certains cas #147

Closed parmentf closed 1 month ago

parmentf commented 2 months ago

Le services affiliationcountry renvoie une erreur quand on lui envoie certaines affiliations.

Cas trouvé: Researcher with grant at Bocconi University Milan.

Pour reproduire:

$ curl -X 'POST' \
  'https://address-kit.services.istex.fr/v1/affiliationcountry/affilcountry' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '[
  {
    "id": 1,
    "value": "Researcher with grant at Bocconi University Milan"
  }
]' | fx
{
  "type": "Fatal run-time error",
  "scope": "statements",
  "date": "2024-08-01T08:23:39.171Z",
  "message": "item #1 [delegate] <Error: item #2 [expand] <Error: [exec] <Error: ./v1/affiliationcountry/detect_country.py exit with code 1>>>",
  "func": "delegate",
  "params": {
    "file": "/app/public/v1/affiliationcountry/affilcountry.ini",
    "server": null
  },
  "traceback": [
    "\t\t    at ChildProcess.<anonymous> (/app/node_modules/@ezs/spawn/lib/exec.js:50:28)",
    "\t\t    at Object.onceWrapper (node:events:628:26)",
    "\t\t    at ChildProcess.emit (node:events:513:28)",
    "\t\t    at Process.ChildProcess._handle.onexit (node:internal/child_process:293:12)",
    "\t    at Feed.warn (/app/node_modules/@ezs/core/lib/engine.js:169:25)",
    "\t    at Feed.f [as error] (/app/node_modules/once/once.js:25:25)",
    "\t    at Feed.stop (/app/node_modules/@ezs/core/lib/feed.js:111:10)",
    "\t    at ChildProcess.<anonymous> (/app/node_modules/@ezs/spawn/lib/exec.js:50:23)",
    "\t    at Object.onceWrapper (node:events:628:26)",
    "\t    at ChildProcess.emit (node:events:513:28)"
  ],
  "index": 1,
  "chunk": ":base64:WwogIHsKICAgICJpZCI6IDEsCiAgICAidmFsdWUiOiAiUmVzZWFyY2hlciB3aXRoIGdyYW50IGF0IEJvY2NvbmkgVW5pdmVyc2l0eSBNaWxhbiIKICB9Cl0="
}

Deuxième cas: Researcher with grant at Bocconi University .

$ curl -X 'POST' \
  'https://address-kit.services.istex.fr/v1/affiliationcountry/affilcountry' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '[
  {
    "id": 1,
    "value": "Researcher with grant at Bocconi University ."
  }
]' | fx
{
  "type": "Fatal run-time error",
  "scope": "statements",
  "date": "2024-08-01T08:31:41.298Z",
  "message": "item #1 [delegate] <Error: item #2 [expand] <Error: [exec] <Error: ./v1/affiliationcountry/detect_country.py exit with code 1>>>",
  "func": "delegate",
  "params": {
    "file": "/app/public/v1/affiliationcountry/affilcountry.ini",
    "server": null
  },
  "traceback": [
    "\t\t    at ChildProcess.<anonymous> (/app/node_modules/@ezs/spawn/lib/exec.js:50:28)",
    "\t\t    at Object.onceWrapper (node:events:628:26)",
    "\t\t    at ChildProcess.emit (node:events:513:28)",
    "\t\t    at Process.ChildProcess._handle.onexit (node:internal/child_process:293:12)",
    "\t    at Feed.warn (/app/node_modules/@ezs/core/lib/engine.js:169:25)",
    "\t    at Feed.f [as error] (/app/node_modules/once/once.js:25:25)",
    "\t    at Feed.stop (/app/node_modules/@ezs/core/lib/feed.js:111:10)",
    "\t    at ChildProcess.<anonymous> (/app/node_modules/@ezs/spawn/lib/exec.js:50:23)",
    "\t    at Object.onceWrapper (node:events:628:26)",
    "\t    at ChildProcess.emit (node:events:513:28)"
  ],
  "index": 1,
  "chunk": ":base64:WwogIHsKICAgICJpZCI6IDEsCiAgICAidmFsdWUiOiAiUmVzZWFyY2hlciB3aXRoIGdyYW50IGF0IEJvY2NvbmkgVW5pdmVyc2l0eSAuIgogIH0KXQ=="
}

Apparemment, le code supprime le contenu des parenthèses, et fini avec un espace suivi d'un point. Ce qui fait planter le programme. Même problème avec les virgules.

Ligne incriminée: https://github.com/Inist-CNRS/web-services/blob/2bcaae97ad1dbcfd6e5539517477c35b723d9488/services/address-kit/v1/affiliationcountry/detect_country.py#L82

Suggestion:

 a=re.sub(' *\([a-zA-Z0-9 ,:\–]+\)','',a)
revolj commented 2 months ago

Autre affiliation qui renvoie une erreur : University of Toulouse, INPT, INP‑PURPAN, 75 voie du T.O.E.C., FR‑31076 TOULOUSE. Email: regis.vezian@purpan.fr ; UMR 7194 HNHP. CERPT Avenue Léon Jean Grégory, FR‑66720 TAUTAVEL

⚠️ ERROR 👇

item #2 [expand] <Error: [expand] <Error: item #1 [URLConnect] >>

parmentf commented 2 months ago

Les affiliations qui ne contiennent pas uniquement une adresse simple posent souvent problème (le premier exemple n'était que la partie de l'affiliation qui suffisait à obtenir une erreur).

Affiliation complète:

Researcher with grant at Bocconi University (Milan). Member of the Tarello Institute for Legal Philosophy (Genoa). Address: Università Bocconi, Dipartimento di Studi giuridici “Angelo Sraffa”, Via Röntgen 1, 20136 Milano, Italia. E-mail: alessio.sardo@unibocconi.it

Les affiliations contenant une parenthèse ont l'air de planter systématiquement (d'où la suggestion dans la description).

Mais ce ne sont pas les seuls cas.

Ce serait bien d'ajouter les cas qu'on corrige dans les tests (mais pas forcément dans les exemples du .ini, ça ne servirait qu'à rendre les exemples moins clairs).

cuxac commented 1 month ago

une nouvelle version est disponible