Description:
The assertions table currently contains many duplicate entries, overstating the number of citations. This task aims to remove duplicate rows, keeping only the most recent entry based on the updated field.
Tasks:
Identify duplicate rows based on the combination of obj_id, subj_id, repository_id, publisher_id, journal_id, accession_number, and source_id.
Keep the most recent entry for each set of duplicates and remove the others.
Query:
WITH ranked_assertions AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY obj_id, subj_id, repository_id, publisher_id, journal_id, accession_number, source_id ORDER BY updated DESC) AS rn
FROM assertions
where source_id = 'c66aafc0-cfd6-4bce-9235-661a4a7c6126'
)
delete obj_id, subj_id, repository_id, publisher_id, journal_id, accession_number FROM assertions
WHERE id IN (
SELECT id
FROM ranked_assertions
WHERE rn > 1
);
Description: The assertions table currently contains many duplicate entries, overstating the number of citations. This task aims to remove duplicate rows, keeping only the most recent entry based on the
updated
field.Tasks:
obj_id
,subj_id
,repository_id
,publisher_id
,journal_id
,accession_number
, andsource_id
.Query: