As part of the Data Citation Corpus data quality improvements, we need to remove rows from the assertions table that have non-citation relationship types. The goal is to clean the database of assertions that do not indicate a citation.
Tasks:
Identify rows with source_id3644e65a-1696-4cdf-9868-64e7539598d2 (DataCite) and a relation_type_id not in the following list:
cites
is-cited-by
references
is-referenced-by
is-supplemented-by
is-supplement-to
Execute the query to remove these rows.
Query:
DELETE FROM assertions
WHERE source_id = '3644e65a-1696-4cdf-9868-64e7539598d2'
AND relation_type_id NOT IN (
'cites', 'is-cited-by', 'references',
'is-referenced-by', 'is-supplemented-by', 'is-supplement-to'
);
Validation:
Before executing the deletion, count the number of rows matching the criteria.
After execution, verify that these rows have been removed.
As part of the Data Citation Corpus data quality improvements, we need to remove rows from the assertions table that have non-citation relationship types. The goal is to clean the database of assertions that do not indicate a citation.
Tasks:
source_id
3644e65a-1696-4cdf-9868-64e7539598d2
(DataCite) and arelation_type_id
not in the following list:cites
is-cited-by
references
is-referenced-by
is-supplemented-by
is-supplement-to
Query:
Validation: Before executing the deletion, count the number of rows matching the criteria. After execution, verify that these rows have been removed.