dcsr-datumo / oeuvres-roud

Project specific ontologies and lists for the project "Oeuvres complètes de Gustave Roud"
0 stars 0 forks source link

import images #62

Closed elespdn closed 1 year ago

elespdn commented 2 years ago

Bulk import documentation: https://docs.dasch.swiss/2022.06.02/DSP-API/03-apis/api-v1/adding-resources/?h=bulk+import#bulk-import

The procedure described in the up to date doc is the same that we've done before.

Images were imported from XML files at https://github.com/LaDHUL/oeuvres-roud/tree/master/bulkimport/OUTPUT_xml/import_images.

The XML files contain all infos to create resources:

To import the XML, various possibilities from what I see:

elespdn commented 2 years ago

Images are on nas SCANLETT (> Gustave Roud > E_scans > scansComplets)

elespdn commented 2 years ago

An alternative would be the procedure followed to import the mapping: https://github.com/LaDHUL/oeuvres-roud/blob/master/mapping/import-mapping.rest

elespdn commented 2 years ago

Decided: create XML files as described here https://docs.dasch.swiss/2022.06.02/DSP-TOOLS/dsp-tools-xmlupload/?h=bulk

elespdn commented 2 years ago

Fixed in https://github.com/LaDHUL/oeuvres-roud/pull/64/commits/aafd82116fdc13e31dbe9f8e84151c33b00f50e1


Fixed

elespdn commented 2 years ago

Other errors, while importing articles.

These errors were due to bug in parsing (split on '.'), now fixed.

The input data file cannot be uploaded due to the following validation error(s):
  Line 14: Element '{https://dasch.swiss/schema}integer': 'Beau-Site' is not a valid value of the atomic type 'xs:integer'.
ERROR The input data file did not pass validation.

The input data file cannot be uploaded due to the following validation error(s):
  Line 14: Element '{https://dasch.swiss/schema}integer': 'Deux fragments d'un hommage à C' is not a valid value of the atomic type 'xs:integer'.
ERROR The input data file did not pass validation.

The input data file cannot be uploaded due to the following validation error(s):
  Line 14: Element '{https://dasch.swiss/schema}integer': 'Fragment d une réponse à C' is not a valid value of the atomic type 'xs:integer'.
ERROR The input data file did not pass validation.

The input data file cannot be uploaded due to the following validation error(s):
  Line 14: Element '{https://dasch.swiss/schema}integer': 'La Fondation C' is not a valid value of the atomic type 'xs:integer'.
ERROR The input data file did not pass validation.

The input data file cannot be uploaded due to the following validation error(s):
  Line 14: Element '{https://dasch.swiss/schema}integer': 'Les Élégies romaines de Goethe traduites par J' is not a valid value of the atomic type 'xs:integer'.
ERROR The input data file did not pass validation.

The input data file cannot be uploaded due to the following validation error(s):
  Line 14: Element '{https://dasch.swiss/schema}integer': 'Sur le Diégo de C' is not a valid value of the atomic type 'xs:integer'.
ERROR The input data file did not pass validation.

The input data file cannot be uploaded due to the following validation error(s):
  Line 14: Element '{https://dasch.swiss/schema}integer': 'Joie' is not a valid value of the atomic type 'xs:integer'.
ERROR The input data file did not pass validation.

The input data file cannot be uploaded due to the following validation error(s):
  Line 14: Element '{https://dasch.swiss/schema}integer': 'Expositions' is not a valid value of the atomic type 'xs:integer'.
ERROR The input data file did not pass validation.

Corrected manually, was a copy of 3

The input data file cannot be uploaded due to the following validation error(s):
  Line 14: Element '{https://dasch.swiss/schema}integer': '3a' is not a valid value of the atomic type 'xs:integer'.
ERROR The input data file did not pass validation.

Fixed, was error in link to existing pub

The input data file is syntactically correct and passed validation.
Uploaded file /mnt/scanlettMounted/GustaveRoud/E_Scan/Scans_complets/Publications/pub_Roud Gustave_D'une certaine pureté_La_Guilde_du_Livre_1940-01/pub_Roud Gustave_D'une certaine pureté_La_Guilde_du_Livre_1940-01_p2_1.tif
ERROR while trying to create resource 'pub_Roud Gustave_D'une certaine pureté_La_Guilde_du_Livre_1940-01_p2_1' (pub_RoudGustave_Dunecertainepureté_La_Guilde_du_Livre_1940-01_p2_1.tif).
The mapping of internal IDs to IRIs was written to id2iri_importpub_Roud Gustave_D'une certaine pureté_La_Guilde_du_Livre_1940-01_p2_1_mapping_20220818-225841.json
Could not upload the following resources: ['pub_RoudGustave_Dunecertainepureté_La_Guilde_du_Livre_1940-01_p2_1.tif']

The input data file is syntactically correct and passed validation.
Uploaded file /mnt/scanlettMounted/GustaveRoud/E_Scan/Scans_complets/Publications/pub_Roud Gustave_D'une certaine pureté_La_Guilde_du_Livre_1940-01/pub_Roud Gustave_D'une certaine pureté_La_Guilde_du_Livre_1940-01_p3_2.tif
ERROR while trying to create resource 'pub_Roud Gustave_D'une certaine pureté_La_Guilde_du_Livre_1940-01_p3_2' (pub_RoudGustave_Dunecertainepureté_La_Guilde_du_Livre_194001_p3_2.tif).
The mapping of internal IDs to IRIs was written to id2iri_importpub_Roud Gustave_D'une certaine pureté_La_Guilde_du_Livre_1940-01_p3_2_mapping_20220819-232649.json
Could not upload the following resources: ['pub_RoudGustave_Dunecertainepureté_La_Guilde_du_Livre_194001_p3_2.tif']

The input data file is syntactically correct and passed validation.
Uploaded file /mnt/scanlettMounted/GustaveRoud/E_Scan/Scans_complets/Publications/pub_Roud Gustave_D'une certaine pureté_La_Guilde_du_Livre_1940-01/pub_Roud Gustave_D'une certaine pureté_La_Guilde_du_Livre_1940-01_p4_3.tif
ERROR while trying to create resource 'pub_Roud Gustave_D'une certaine pureté_La_Guilde_du_Livre_1940-01_p4_3' (pub_RoudGustave_Dunecertainepureté_La_Guilde_du_Livre_194001_p4_3.tif).
The mapping of internal IDs to IRIs was written to id2iri_importpub_Roud Gustave_D'une certaine pureté_La_Guilde_du_Livre_1940-01_p4_3_mapping_20220819-233649.json
Could not upload the following resources: ['pub_RoudGustave_Dunecertainepureté_La_Guilde_du_Livre_194001_p4_3.tif']
elespdn commented 2 years ago

More errors while importing articles

Script to create XML check the correspondance between the label of the resource already in the db and the path of the file to be imported, which should include the label of the resource it belongs to. For example, it

The name of the file to be imported were created manually, by copy pasting the label from the db, while digitizing the document.

All sorts of cases in which no correspondence is found.


Typo -> fixed in https://github.com/LaDHUL/oeuvres-roud/pull/64/commits/e5371c0f2554251e34bc2a444bb77cd7e922348e

pub_Roud Gustave_Sur les Châteaux en enfance de Catherine Colomb_La_Guilde_du_Livre_1945-08_p129_2
pub_Roud Gustave_Sur les Châteaux en enfance de Catherine Collomb_La_Guilde_du_Livre_1945-08
MISSING LINK

Label and file names (apostrophes, trailing spaces, quotation marks before and after, double white spaces, squared brackets) -> fixed in b691ce6, 24c5fef, 0fe37fc, d31bafe, 1cb9cb0

pub_Roud Gustave_D'un cahier d'instants_1312 Organe de l’Association romande du personnel de la librairie et de l’édition_1947-12_p15_2
pub_Roud Gustave_D'un cahier d'instants_13 12_1947-12
MISSING LINK

pub_Roud Gustave_Hommage de Gustave Roud_13 12_1968-12_p7_2
pub_Roud Gustave_Hommage de Gustave Roud_13 12_1968-12
MISSING LINK

pub_Roud Gustave_[Ouvre les yeux ferme les yeux]_La_Guilde_du_Livre_1950-12_p264_1
pub_Roud Gustave_[Ouvre les yeux ferme les yeux]_La_Guilde_du_Livre_1950-12
MISSING LINK

pub_Roud Gustave_Le Secret des Compagnons, par Henri Pourrat _Suisse_romande_1938-02_p253_1
pub_Roud Gustave_Le Secret des Compagnons, par Henri Pourrat _Suisse_romande_1938-02
MISSING LINK

pub_Hölderlin Friedrich_Grèce Âges de la vie_Lettres françaises_1967-05_p13_5
pub_Hölderlin Friedrich_Grèce Âges de la vie_Lettres françaises_1967-05
MISSING LINK

Book sections instead of articles, so no match found in the csv (export of db only for articles) -> fixed in https://github.com/LaDHUL/oeuvres-roud/pull/64/commits/9e2e230a8a6d0764af5c34f18955d775d76aef1d

pub_Roud Gustave_Mémoire_Inno-Reflets_1967_p9_2
pub_Roud Gustave_Mémoire_Inno-Reflets_1967
MISSING LINK

pub_Roud Gustave_Appel d'hiver_Poésie 1, La poésie française de Suisse_1973-05_p105_4
pub_Roud Gustave_Appel d'hiver_Poésie 1, La poésie française de Suisse_1973-05
MISSING LINK

? No match found even if there is a resource with the same label ('find' in file works) -> fixed manually https://github.com/LaDHUL/oeuvres-roud/pull/64/commits/5db1ed2b7cfe5ef41a4d8579ef644f281811b6fd

pub_Roud Gustave_Appel d'hiver_Poésie 1, La poésie française de Suisse_1973-05_p105_4
pub_Roud Gustave_Appel d'hiver_Poésie 1, La poésie française de Suisse_1973-05
MISSING LINK

pub_Huttinger Edouard_La Peinture hollandaise_La_Guilde_du_Livre_1957-1_p34_3
pub_Huttinger Edouard_La Peinture hollandaise_La_Guilde_du_Livre_1957-1
MISSING LINK

pub_Roud Gustave_Le Secret des Compagnons, par Henri Pourrat _Suisse_romande_1938-02_p253_1
pub_Roud Gustave_Le Secret des Compagnons, par Henri Pourrat _Suisse_romande_1938-02
MISSING LINK