ecolabdata / ecospheres

Portail des données de la transition écologique et de la cohésion des territoires
https://ecologie.data.gouv.fr
2 stars 0 forks source link

Nombre de jeux de données différent entre /datasets?organization=XXX et /organizations/XXX #485

Open streino opened 1 day ago

streino commented 1 day ago

https://ecologie.data.gouv.fr/datasets?page=1&organization=55842c2bc751df66bba453b9 liste seulement 1 JDD (inattendu) :

image


Alors que https://ecologie.data.gouv.fr/organizations/55842c2bc751df66bba453b9 en liste 10 (attendu) :

image

abulte commented 23 hours ago

Le feed de l'univers a l'air bon (Feeding 10 datasets from 'museum-national-dhistoire-naturelle').

Sans le filtre topic, l'API datasets retourne bien 10 jeux de données https://www.data.gouv.fr/api/2/datasets/search/?page=1&organization=55842c2bc751df66bba453b9&page_size=21

En ajoutant le filtre topic (requête utilisée par le front), on tombe à 1 jeu de données : https://www.data.gouv.fr/api/2/datasets/search/?topic=65e9aa6cb5c809c30c70ee02&page=1&organization=55842c2bc751df66bba453b9&page_size=21

Le topic filtré sur cette organisation retourne bien 10 jeux de données : https://www.data.gouv.fr/api/2/topics/65e9aa6cb5c809c30c70ee02/datasets/?organization=55842c2bc751df66bba453b9

D'une manière plus générale, si on compare le nombre de jeux de données qu'on cherche à alimenter pour une organisation dans le script univers vs ce qui est dans le topic (via /api/2/topics/{topic}/datasets/?organization={org}), on n'a aucune désynchronisation. Le topic a donc l'air correct.

On peut soupçonner un problème avec l'attribut topic dans l'index des datasets de data.gouv.fr.

❯ python feed-universe.py --check universe-prod.yaml env-prod.yaml
Starting at Tue Nov 19 08:27:52 2024
*** CHECKING ***
Processing 178 organizations...
Updating topic 'univers-ecospheres'
Datasets for 'ademe' are in sync (108 datasets)
Datasets for 'agence-de-l-eau-rhone-mediterranee-et-corse' are in sync (13 datasets)
Datasets for 'agence-de-leau-seine-normandie-1' are in sync (27 datasets)
Datasets for 'agence-nationale-de-la-cohesion-des-territoires' are in sync (43 datasets)
Datasets for 'agence-nationale-de-lhabitat' are in sync (5 datasets)
Datasets for 'agence-nationale-pour-la-renovation-urbaine' are in sync (15 datasets)
Datasets for 'agence-ore-3' are in sync (53 datasets)
Datasets for 'airparif-1' are in sync (14 datasets)
Datasets for 'atmo-auvergne-rhone-alpes' are in sync (182 datasets)
Datasets for 'atmo-bourgogne-franche-comte' are in sync (6 datasets)
Datasets for 'atmo-grand-est' are in sync (174 datasets)
Datasets for 'atmosud' are in sync (16 datasets)
Datasets for 'bureau-de-recherches-geologiques-et-minieres' are in sync (48 datasets)
Datasets for 'centre-scientifique-et-technique-du-batiment' are in sync (3 datasets)
Datasets for 'cerema' are in sync (136 datasets)
Datasets for 'commission-de-regulation-de-lenergie' are in sync (15 datasets)
Datasets for 'datagrandest' are in sync (139 datasets)
Datasets for 'ddt-aisne' are in sync (17 datasets)
Datasets for 'ddt-ardennes' are in sync (29 datasets)
Datasets for 'ddt-aveyron' are in sync (114 datasets)
Datasets for 'ddt-cote-dor' are in sync (632 datasets)
Datasets for 'ddt-creuse' are in sync (34 datasets)
Datasets for 'ddt-de-la-loire' are in sync (1 datasets)
Datasets for 'ddt-de-la-meuse' are in sync (34 datasets)
Skipping empty organization 'ddt-de-la-moselle'
Datasets for 'ddt-de-lain' are in sync (129 datasets)
Datasets for 'ddt-de-lardeche' are in sync (7 datasets)
Datasets for 'ddt-de-meurthe-et-moselle' are in sync (782 datasets)
Datasets for 'ddt-deux-sevres' are in sync (106 datasets)
Skipping empty organization 'ddt-dordogne'
Datasets for 'ddt-drome' are in sync (416 datasets)
Datasets for 'ddt-haute-garonne' are in sync (94 datasets)
Datasets for 'ddt-haute-marne' are in sync (90 datasets)
Skipping empty organization 'ddt-haute-vienne'
Datasets for 'ddt-hautes-alpes' are in sync (39 datasets)
Datasets for 'ddt-hautes-pyrenees' are in sync (494 datasets)
Datasets for 'ddt-indre' are in sync (140 datasets)
Datasets for 'ddt-jura' are in sync (50 datasets)
Datasets for 'ddt-lot' are in sync (55 datasets)
Datasets for 'ddt-maine-et-loire' are in sync (139 datasets)
Datasets for 'ddt-mayenne' are in sync (45 datasets)
Datasets for 'ddt-nievre' are in sync (196 datasets)
Skipping empty organization 'ddt-oise'
Datasets for 'ddt-savoie' are in sync (228 datasets)
Datasets for 'ddt-tarn' are in sync (80 datasets)
Datasets for 'ddt-tarn-et-garonne' are in sync (325 datasets)
Datasets for 'ddt-territoire-de-belfort' are in sync (9 datasets)
Datasets for 'ddt-vaucluse' are in sync (46 datasets)
Datasets for 'ddt-vienne' are in sync (136 datasets)
Skipping empty organization 'ddt-yonne'
Datasets for 'ddt-yvelines' are in sync (46 datasets)
Datasets for 'ddtm-alpes-maritimes' are in sync (32 datasets)
Datasets for 'ddtm-aude' are in sync (58 datasets)
Datasets for 'ddtm-corse-du-sud' are in sync (183 datasets)
Datasets for 'ddtm-cotes-darmor' are in sync (20 datasets)
Datasets for 'ddtm-finistere' are in sync (109 datasets)
Datasets for 'ddtm-gard' are in sync (405 datasets)
Datasets for 'ddtm-gironde' are in sync (30 datasets)
Datasets for 'ddtm-haute-corse' are in sync (84 datasets)
Datasets for 'ddtm-herault' are in sync (102 datasets)
Skipping empty organization 'ddtm-manche'
Datasets for 'ddtm-morbihan' are in sync (26 datasets)
Skipping empty organization 'ddtm-nord'
Datasets for 'ddtm-pas-de-calais' are in sync (507 datasets)
Datasets for 'ddtm-pyrenees-orientales' are in sync (60 datasets)
Skipping empty organization 'ddtm-seine-maritime'
Datasets for 'ddtm-vendee' are in sync (133 datasets)
Datasets for 'deal-guadeloupe' are in sync (141 datasets)
Skipping empty organization 'deal-mayotte'
Skipping empty organization 'deal-reunion'
Datasets for 'direction-departementale-des-territoire-du-puy-de-dome' are in sync (230 datasets)
Datasets for 'direction-departementale-des-territoires-de-charente' are in sync (97 datasets)
Datasets for 'direction-departementale-des-territoires-de-haute-saone' are in sync (112 datasets)
Datasets for 'direction-departementale-des-territoires-de-haute-savoie' are in sync (593 datasets)
Datasets for 'direction-departementale-des-territoires-de-la-correze' are in sync (406 datasets)
Datasets for 'direction-departementale-des-territoires-de-la-haute-loire' are in sync (6 datasets)
Datasets for 'direction-departementale-des-territoires-de-la-lozere' are in sync (88 datasets)
Datasets for 'direction-departementale-des-territoires-de-la-marne' are in sync (192 datasets)
Datasets for 'direction-departementale-des-territoires-de-la-sarthe' are in sync (137 datasets)
Datasets for 'direction-departementale-des-territoires-de-lallier' are in sync (19 datasets)
Datasets for 'direction-departementale-des-territoires-de-lariege' are in sync (1034 datasets)
Datasets for 'direction-departementale-des-territoires-de-laube' are in sync (571 datasets)
Datasets for 'direction-departementale-des-territoires-de-lessonne' are in sync (281 datasets)
Datasets for 'direction-departementale-des-territoires-de-lisere' are in sync (323 datasets)
Datasets for 'direction-departementale-des-territoires-de-loir-et-cher' are in sync (274 datasets)
Datasets for 'direction-departementale-des-territoires-de-lorne' are in sync (606 datasets)
Skipping empty organization 'direction-departementale-des-territoires-de-lot-et-garonne'
Datasets for 'direction-departementale-des-territoires-de-saone-et-loire' are in sync (156 datasets)
Datasets for 'direction-departementale-des-territoires-de-seine-et-marne' are in sync (174 datasets)
Datasets for 'direction-departementale-des-territoires-des-alpes-de-haute-provence' are in sync (85 datasets)
Datasets for 'direction-departementale-des-territoires-des-vosges' are in sync (205 datasets)
Datasets for 'direction-departementale-des-territoires-deure-et-loir' are in sync (227 datasets)
Datasets for 'direction-departementale-des-territoires-dindre-et-loire' are in sync (111 datasets)
Datasets for 'direction-departementale-des-territoires-du-bas-rhin-67' are in sync (24 datasets)
Datasets for 'direction-departementale-des-territoires-du-cher' are in sync (1071 datasets)
Datasets for 'direction-departementale-des-territoires-du-doubs' are in sync (20 datasets)
Skipping empty organization 'direction-departementale-des-territoires-du-gers'
Datasets for 'direction-departementale-des-territoires-du-haut-rhin-68' are in sync (118 datasets)
Datasets for 'direction-departementale-des-territoires-du-loiret' are in sync (372 datasets)
Datasets for 'direction-departementale-des-territoires-du-rhone-1' are in sync (188 datasets)
Datasets for 'direction-departementale-des-territoires-du-val-doise' are in sync (255 datasets)
Datasets for 'direction-departementale-des-territoires-et-de-la-mer-de-charente-maritime' are in sync (607 datasets)
Datasets for 'direction-departementale-des-territoires-et-de-la-mer-de-la-somme' are in sync (300 datasets)
Datasets for 'direction-departementale-des-territoires-et-de-la-mer-de-leure' are in sync (196 datasets)
Datasets for 'direction-departementale-des-territoires-et-de-la-mer-de-loire-atlantique' are in sync (154 datasets)
Datasets for 'direction-departementale-des-territoires-et-de-la-mer-des-bouches-du-rhone' are in sync (279 datasets)
Datasets for 'direction-departementale-des-territoires-et-de-la-mer-des-landes' are in sync (244 datasets)
Datasets for 'direction-departementale-des-territoires-et-de-la-mer-des-pyrenees-atlantiques' are in sync (795 datasets)
Datasets for 'direction-departementale-des-territoires-et-de-la-mer-dille-et-vilaine' are in sync (222 datasets)
Datasets for 'direction-departementale-des-territoires-et-de-la-mer-du-calvados' are in sync (144 datasets)
Datasets for 'direction-departementale-des-territoires-et-de-la-mer-du-var' are in sync (334 datasets)
Datasets for 'direction-departementale-et-des-territoires-du-cantal' are in sync (167 datasets)
Datasets for 'direction-regionale-de-lenvironnement-de-lamenagement-et-du-logement-de-normandie' are in sync (252 datasets)
Datasets for 'direction-regionale-de-lenvironnement-de-lamenagement-et-du-logement-du-centre-val-de-loire' are in sync (93 datasets)
Datasets for 'direction-regionale-de-lenvironnement-et-du-logement-bourgogne-franche-comte' are in sync (148 datasets)
Datasets for 'do-terr-geo-centre' are in sync (352 datasets)
Skipping empty organization 'dreal-corse'
Datasets for 'dreal-grand-est' are in sync (737 datasets)
Skipping empty organization 'dreal-hauts-de-france'
Skipping empty organization 'dreal-nouvelle-aquitaine'
Skipping empty organization 'dreal-pays-de-la-loire-direction-regionale-de-lenvironnement-de-lamenagement-et-du-logement-pays-de-la-loire'
Datasets for 'dreal-provence-alpes-cote-dazur' are in sync (275 datasets)
Datasets for 'edf-systemes-energetiques-insulaires' are in sync (62 datasets)
Datasets for 'electricite-de-france' are in sync (61 datasets)
Datasets for 'electricite-reseau-distribution-france' are in sync (79 datasets)
Datasets for 'engie-mobilites-electriques' are in sync (11 datasets)
Datasets for 'equipe-transport-data-gouv-fr' are in sync (9 datasets)
Datasets for 'federation-des-parcs-naturels-regionaux-de-france' are in sync (1 datasets)
Datasets for 'fluo-grand-est' are in sync (45 datasets)
Datasets for 'geo2france' are in sync (166 datasets)
Datasets for 'gip-bretagne-environnement' are in sync (87 datasets)
Datasets for 'grdf' are in sync (23 datasets)
Datasets for 'ile-de-france-mobilites' are in sync (80 datasets)
Datasets for 'institut-francais-de-recherche-pour-lexploitation-de-la-mer' are in sync (3 datasets)
Datasets for 'institut-national-de-l-information-geographique-et-forestiere' are in sync (40 datasets)
Datasets for 'irstea' are in sync (13 datasets)
Datasets for 'jvmalin' are in sync (11 datasets)
Datasets for 'ligair' are in sync (29 datasets)
Datasets for 'meteo-france' are in sync (90 datasets)
Datasets for 'ministere-de-l-egalite-des-territoires-et-du-logement' are in sync (24 datasets)
Datasets for 'ministere-de-la-transition-ecologique' are in sync (135 datasets)
Datasets for 'morbihan-energies' are in sync (12 datasets)
Datasets for 'museum-national-dhistoire-naturelle' are in sync (10 datasets)
Datasets for 'observatoire-de-recherche-montpellierain-de-lenvironnement' are in sync (144 datasets)
Datasets for 'office-national-des-forets' are in sync (11 datasets)
Datasets for 'office-public-de-lhabitat-des-landes' are in sync (75 datasets)
Datasets for 'oise-mobilite-syndicat-mixte-des-transports-collectifs-de-loise' are in sync (16 datasets)
Datasets for 'open-data-reseaux-energies-1' are in sync (202 datasets)
Skipping empty organization 'parc-geoparc-normandie-maine'
Datasets for 'parc-national-de-forets' are in sync (1 datasets)
Skipping empty organization 'parc-national-de-port-cros'
Datasets for 'parc-national-des-cevennes' are in sync (7 datasets)
Datasets for 'parc-national-des-ecrins' are in sync (4 datasets)
Skipping empty organization 'parc-naturel-centre'
Skipping empty organization 'parc-naturel-de-martinique'
Datasets for 'parc-naturel-regional-de-la-montagne-de-reims' are in sync (7 datasets)
Datasets for 'parc-naturel-regional-de-la-montagne-de-reims-1' are in sync (1 datasets)
Datasets for 'parc-naturel-regional-des-vosges-du-nord' are in sync (9 datasets)
Skipping empty organization 'parc-naturel-regional-du-marais-poitevin'
Datasets for 'parc-naturel-regional-du-morvan' are in sync (34 datasets)
Datasets for 'parc-naturel-regional-du-vercors' are in sync (6 datasets)
Skipping empty organization 'parc-naturel-regional-livradois-forez'
Datasets for 'parcs-nationaux-de-france' are in sync (1 datasets)
Datasets for 'parcs-naturels-regionaux-de-provence-alpes-cote-dazur' are in sync (127 datasets)
Skipping empty organization 'pole-national-de-donnees-de-biodiversite'
Datasets for 'regie-autonome-des-transports-parisiens-ratp' are in sync (32 datasets)
Datasets for 'reseau-de-transport-delectricite' are in sync (28 datasets)
Datasets for 'section-cadastre-topographie-de-la-polynesie-francaise' are in sync (12 datasets)
Datasets for 'shom' are in sync (73 datasets)
Datasets for 'sncf' are in sync (207 datasets)
Datasets for 'societe-du-grand-paris' are in sync (13 datasets)
Datasets for 'star' are in sync (45 datasets)
Datasets for 'syndicat-mixte-des-milieux-aquatiques-et-des-rivieres-de-laude' are in sync (15 datasets)
Datasets for 'syndicat-mixte-des-mobilites-de-laire-grenobloise' are in sync (34 datasets)
Datasets for 'syndicat-mixte-eptb-meurthe-madon' are in sync (20 datasets)
Datasets for 'systeme-d-information-sur-l-eau' are in sync (627 datasets)
Datasets for 'systeme-dinformation-sur-la-biodiversite' are in sync (30 datasets)
Datasets for 'twisto' are in sync (23 datasets)
Total count: 23469, elapsed: 235.40 s
Done at Tue Nov 19 08:31:47 2024
maudetes commented 23 hours ago

Merci pour ce ticket et l'enquête détaillés !

En regardant dans l'index le contenu d'un jeu de données manquant (curl <path>/udata-dataset/_doc/559a6bb1c751df57a5390bd3), on voit que le champ topics est vide:

    "schema": null,
    "topics": [],
    "orga_sp": 4,
    "orga_followers": 3.258096538021482,
    "organization": "55842c2bc751df66bba453b9",

En relançant une indexation explicitement de ce jeu de données, on voit le champ topics vaut bien

"topics": [
      "65e9aa6cb5c809c30c70ee02"
    ],

Cela signifie qu'il y a eu un soucis d'indexation qui n'a pas eu lieu. A priori, les cas d'indexations sont gérés via le pre_save.

Pour débloquer la situation temporairement, j'ai lancé une reindexation de tous les jeux de données dans le topic 65e9aa6cb5c809c30c70ee02. Cela ne résout cependant pas le problème original et je propose de laisser ce ticket ouvert pour investigation future.

abulte commented 23 hours ago

Situation avant (~pendant) fix temporaire :

❯ python feed-universe.py --check universe-prod.yaml env-prod.yaml
Starting at Tue Nov 19 09:13:23 2024
*** CHECKING ***
Processing 178 organizations...
Updating topic 'univers-ecospheres'
Datasets for 'cerema' are NOT in sync
  - topic datasets : 136
  - universe       : 136
  - search datasets: 125
Datasets for 'ddt-haute-marne' are NOT in sync
  - topic datasets : 90
  - universe       : 90
  - search datasets: 88
Datasets for 'ddtm-herault' are NOT in sync
  - topic datasets : 102
  - universe       : 102
  - search datasets: 94
Datasets for 'ddtm-morbihan' are NOT in sync
  - topic datasets : 26
  - universe       : 26
  - search datasets: 25
Datasets for 'direction-departementale-des-territoires-de-lisere' are NOT in sync
  - topic datasets : 323
  - universe       : 323
  - search datasets: 322
Datasets for 'direction-departementale-des-territoires-de-lorne' are NOT in sync
  - topic datasets : 606
  - universe       : 606
  - search datasets: 605
Datasets for 'direction-departementale-des-territoires-de-seine-et-marne' are NOT in sync
  - topic datasets : 174
  - universe       : 174
  - search datasets: 173
Datasets for 'direction-departementale-des-territoires-et-de-la-mer-de-leure' are NOT in sync
  - topic datasets : 196
  - universe       : 196
  - search datasets: 194
Datasets for 'direction-departementale-des-territoires-et-de-la-mer-des-bouches-du-rhone' are NOT in sync
  - topic datasets : 279
  - universe       : 279
  - search datasets: 278
Datasets for 'direction-departementale-des-territoires-et-de-la-mer-dille-et-vilaine' are NOT in sync
  - topic datasets : 222
  - universe       : 222
  - search datasets: 221
Datasets for 'do-terr-geo-centre' are NOT in sync
  - topic datasets : 352
  - universe       : 352
  - search datasets: 326
Datasets for 'dreal-provence-alpes-cote-dazur' are NOT in sync
  - topic datasets : 275
  - universe       : 275
  - search datasets: 270
Datasets for 'geo2france' are NOT in sync
  - topic datasets : 166
  - universe       : 166
  - search datasets: 165
Datasets for 'museum-national-dhistoire-naturelle' are NOT in sync
  - topic datasets : 10
  - universe       : 10
  - search datasets: 2
Total count: 23469, elapsed: 319.84 s
Done at Tue Nov 19 09:18:43 2024
abulte commented 22 hours ago

Après le fix, we're in sync ! Merci @maudetes 🙏

❯ python feed-universe.py --check universe-prod.yaml env-prod.yaml
Starting at Tue Nov 19 09:54:34 2024
*** CHECKING ***
Processing 178 organizations...
Updating topic 'univers-ecospheres'
Total count: 23469, elapsed: 328.94 s
Done at Tue Nov 19 10:00:03 2024