datacite / corpus-data-file

Code and steps used to generate the Data Citation Corpus dump file
MIT License
2 stars 0 forks source link

Remove Rows Associated with Clinical Trials Registries #15

Closed ashwinisukale closed 3 weeks ago

ashwinisukale commented 1 month ago

Description: Rows in the assertions table associated with ClinicalTrials.gov and EU Clinical Trial Register are not data citations and need to be removed. This task aims to clean the database by removing these inaccurate entries.

Tasks:

  1. Identify rows with repository_id fef75a3c-6e48-4170-be9d-415601efb689 (ClinicalTrials.gov) or 2638e611-ff6f-49db-9b3e-702ecd16176b (EUCTR).
  2. Execute the query to remove these rows.

Query:

DELETE FROM assertions
WHERE repository_id IN (
    'fef75a3c-6e48-4170-be9d-415601efb689',
    '2638e611-ff6f-49db-9b3e-702ecd16176b'
);

Validation:

Before executing the deletion, count the number of rows matching the criteria. After execution, verify that these rows have been removed.