digital-guard / preserv-BR

Digital Preservation of Brazilian metadata of donated maps.
http://git.digital-guard.org/preserv-BR
3 stars 1 forks source link

make load_optim_csv pg_datalake=dl03t_main #101

Closed crebollobr closed 2 years ago

crebollobr commented 2 years ago

make load_optim_csv pg_datalake=dl03t_main está gerando erro mas não consegui identificar o problema

/var/gits/_dg/preserv-BR/data/donatedPack.csv aparentemente não tem nenhum problema

Estava fazendo ingestão de Osasco/_pk0028.01 com um make_conf.yaml novo, renomeei para make_conf.yaml-LIXO para testar.


psql postgres://postgres@localhost/dl03t_main < /var/gits/_dg/preserv/src/optim-step2-ini.sql
                                      load_donor_pack
-------------------------------------------------------------------------------------------
  tmp_orig.fdw_donorbr was created!                                                       +
  source: /var/gits/_dg/preserv-BR/data/donor.csv  tmp_orig.fdw_donatedPackbr was created!+
  source: /var/gits/_dg/preserv-BR/data/donatedPack.csv
(1 row)

ERROR:  ON CONFLICT DO UPDATE command cannot affect row a second time
HINT:  Ensure that no rows proposed for insertion within the same command have duplicate constrained values.
CONTEXT:  SQL statement "
    -- popula optim.donated_PackFileVers a partir de optim.donated_PackTpl
    INSERT INTO optim.donated_PackFileVers (hashedfname, pack_id, pack_item, pack_item_accepted_date, kx_pack_item_version, user_resp)
    SELECT j->>'file'::text AS hashedfname, t.pack_id , (j->>'p')::int AS pack_item, accepted_date::date AS pack_item_accepted_date, lst_vers, lower(t.user_resp::text) AS user_resp
    FROM (
        SELECT pt.id AS pack_id, pt.user_resp, fpt.accepted_date, fpt.lst_vers, jsonb_array_elements((yamlfile_to_jsonb(optim.format_filepath(fpt.escopo, fpt.donor_id, fpt.pack_count)))->'files')::jsonb AS j
        FROM optim.donated_packtpl pt
        LEFT JOIN optim.donor d
        ON pt.donor_id = d.id
        LEFT JOIN tmp_orig.fdw_donatedpackbr fpt
        ON d.local_serial = fpt.donor_id AND pt.pk_count = fpt.pack_count
        WHERE pt.donor_id IN (
                SELECT id FROM optim.donor
                WHERE country_id = (
                    SELECT jurisd_base_id
                    FROM optim.jurisdiction
                    WHERE admin_level = 2 AND lower(abbrev) = lower('br')
                    )
                )
            AND ((yamlfile_to_jsonb(optim.format_filepath(fpt.escopo, fpt.donor_id, fpt.pack_count)))->'pkversion')::int = lst_vers
            AND file_exists(optim.format_filepath(fpt.escopo, fpt.donor_id, fpt.pack_count))
        ) AS t
    WHERE j->'file' IS NOT NULL -- verificar hash null
    ON CONFLICT (hashedfname)
    DO UPDATE
    SET pack_id=EXCLUDED.pack_id, pack_item=EXCLUDED.pack_item, pack_item_accepted_date=EXCLUDED.pack_item_accepted_date, user_resp=EXCLUDED.user_resp;
  "
PL/pgSQL function optim.insert_donor_pack(text) line 76 at EXECUTE
                                      load_donor_pack
0e1 commented 2 years ago

O problema está na duplicidade de sha256.

Os arquivos a seguir possuem file com sha256 iguais:

https://github.com/digital-guard/preserv-BR/blob/main/data/SP/Cabreuva/_pk0040.01/make_conf.yaml https://github.com/digital-guard/preserv-BR/blob/main/data/SP/SaoPaulo/_pk0033.01/make_conf.yaml

crebollobr commented 2 years ago

Corrigido