datacite / corpus-data-file

Code and steps used to generate the Data Citation Corpus dump file
MIT License
2 stars 0 forks source link

Add script to validate accession numbers #21

Closed kaysiz closed 3 weeks ago

kaysiz commented 4 weeks ago

Purpose

We need a way to validate accession number for top 20 repository assertions from CZI.

closes: #18

Approach

Processing repository_id: 00363b65-f3ef-4fa9-8255-23ab269f4930
Number of rows fetched: 3755354
CSV file created: accession_number_validation_data/00363b65-f3ef-4fa9-8255-23ab269f4930.csv
Time taken for repository_id 00363b65-f3ef-4fa9-8255-23ab269f4930: 4 minutes 33 seconds

Processing repository_id: 87646104-e5ef-494b-b2f3-a46c9572e003
Number of rows fetched: 1729783
CSV file created: accession_number_validation_data/87646104-e5ef-494b-b2f3-a46c9572e003.csv
Time taken for repository_id 87646104-e5ef-494b-b2f3-a46c9572e003: 1 minutes 40 seconds

Processing repository_id: 6087b2e9-ecbf-4898-8047-5f484f1bce2f
Number of rows fetched: 890431
CSV file created: accession_number_validation_data/6087b2e9-ecbf-4898-8047-5f484f1bce2f.csv
Time taken for repository_id 6087b2e9-ecbf-4898-8047-5f484f1bce2f: 1 minutes 16 seconds

Processing repository_id: b2a4aa2b-db3f-456a-8e2b-7d935343385e
Number of rows fetched: 489706
CSV file created: accession_number_validation_data/b2a4aa2b-db3f-456a-8e2b-7d935343385e.csv
Time taken for repository_id b2a4aa2b-db3f-456a-8e2b-7d935343385e: 0 minutes 24 seconds

Processing repository_id: 1edec4bf-cfee-4296-8893-d1b0ca528f92
Number of rows fetched: 259548
CSV file created: accession_number_validation_data/1edec4bf-cfee-4296-8893-d1b0ca528f92.csv
Time taken for repository_id 1edec4bf-cfee-4296-8893-d1b0ca528f92: 0 minutes 19 seconds

Processing repository_id: 58d689da-7c8c-4ac1-90c9-69253d28f81f
Number of rows fetched: 257986
CSV file created: accession_number_validation_data/58d689da-7c8c-4ac1-90c9-69253d28f81f.csv
Time taken for repository_id 58d689da-7c8c-4ac1-90c9-69253d28f81f: 0 minutes 15 seconds

Processing repository_id: 5f36c68f-bb46-4a21-9b95-6bb87de12aa0
Number of rows fetched: 113611
CSV file created: accession_number_validation_data/5f36c68f-bb46-4a21-9b95-6bb87de12aa0.csv
Time taken for repository_id 5f36c68f-bb46-4a21-9b95-6bb87de12aa0: 0 minutes 8 seconds

Processing repository_id: 8d9c72f8-7b96-4b5c-86b0-b3f0dd7d0b0d
Number of rows fetched: 106217
CSV file created: accession_number_validation_data/8d9c72f8-7b96-4b5c-86b0-b3f0dd7d0b0d.csv
Time taken for repository_id 8d9c72f8-7b96-4b5c-86b0-b3f0dd7d0b0d: 0 minutes 7 seconds

Processing repository_id: 19ad31a7-e6d0-4547-ad14-1201d3c96dca
Number of rows fetched: 32617
CSV file created: accession_number_validation_data/19ad31a7-e6d0-4547-ad14-1201d3c96dca.csv
Time taken for repository_id 19ad31a7-e6d0-4547-ad14-1201d3c96dca: 0 minutes 4 seconds

Processing repository_id: 524e4405-f959-4e3c-ab4e-eecaa8ed23d5
Number of rows fetched: 24411
CSV file created: accession_number_validation_data/524e4405-f959-4e3c-ab4e-eecaa8ed23d5.csv
Time taken for repository_id 524e4405-f959-4e3c-ab4e-eecaa8ed23d5: 0 minutes 1 seconds

Processing repository_id: 1f463165-6957-491b-a1e1-e484540200f0
Number of rows fetched: 22729
CSV file created: accession_number_validation_data/1f463165-6957-491b-a1e1-e484540200f0.csv
Time taken for repository_id 1f463165-6957-491b-a1e1-e484540200f0: 0 minutes 5 seconds

Processing repository_id: 79760077-45df-4626-9675-60ee459aa283
Number of rows fetched: 15817
CSV file created: accession_number_validation_data/79760077-45df-4626-9675-60ee459aa283.csv
Time taken for repository_id 79760077-45df-4626-9675-60ee459aa283: 0 minutes 1 seconds

Processing repository_id: b5966ef4-8bd3-4de8-aafb-396df8e75b0b
Number of rows fetched: 13310
CSV file created: accession_number_validation_data/b5966ef4-8bd3-4de8-aafb-396df8e75b0b.csv
Time taken for repository_id b5966ef4-8bd3-4de8-aafb-396df8e75b0b: 0 minutes 1 seconds

Processing repository_id: 8748538d-965e-4440-85cc-d9d1722e7ca9
Number of rows fetched: 10370
CSV file created: accession_number_validation_data/8748538d-965e-4440-85cc-d9d1722e7ca9.csv
Time taken for repository_id 8748538d-965e-4440-85cc-d9d1722e7ca9: 0 minutes 1 seconds

Processing repository_id: 66807551-597e-4088-9743-32690481f6ff
Number of rows fetched: 10106
CSV file created: accession_number_validation_data/66807551-597e-4088-9743-32690481f6ff.csv
Time taken for repository_id 66807551-597e-4088-9743-32690481f6ff: 0 minutes 1 seconds

Processing repository_id: b4440b59-ca28-4a67-a65f-2dc02fb0aa23
Number of rows fetched: 7993
CSV file created: accession_number_validation_data/b4440b59-ca28-4a67-a65f-2dc02fb0aa23.csv
Time taken for repository_id b4440b59-ca28-4a67-a65f-2dc02fb0aa23: 0 minutes 1 seconds

Processing repository_id: f43825eb-5b72-4f1a-b716-dc7eec6d4206
Number of rows fetched: 6605
CSV file created: accession_number_validation_data/f43825eb-5b72-4f1a-b716-dc7eec6d4206.csv
Time taken for repository_id f43825eb-5b72-4f1a-b716-dc7eec6d4206: 0 minutes 0 seconds

Processing repository_id: c908c286-c01b-44c7-bac9-3bd53148d898
Number of rows fetched: 3430
CSV file created: accession_number_validation_data/c908c286-c01b-44c7-bac9-3bd53148d898.csv
Time taken for repository_id c908c286-c01b-44c7-bac9-3bd53148d898: 0 minutes 0 seconds

Processing repository_id: 345977e0-6fb8-476e-9742-0b8987e2fce8
Number of rows fetched: 3324
CSV file created: accession_number_validation_data/345977e0-6fb8-476e-9742-0b8987e2fce8.csv
Time taken for repository_id 345977e0-6fb8-476e-9742-0b8987e2fce8: 0 minutes 0 seconds

Processing repository_id: 0a60b1a9-041a-444e-bd6a-94caaab7591b
Number of rows fetched: 2838
CSV file created: accession_number_validation_data/0a60b1a9-041a-444e-bd6a-94caaab7591b.csv
Time taken for repository_id 0a60b1a9-041a-444e-bd6a-94caaab7591b: 0 minutes 0 seconds

Total time taken: 9 minutes 7 seconds

Open Questions and Pre-Merge TODOs

Learning

Types of changes

Reviewer, please remember our guidelines: