Closed ChariniNana closed 4 years ago
Numbers of missing creator and description metadata looks as follows after this implementation
Sub provider | No Creator | Total Images | Missing Percentage
si_national_museum_of_natural_history | 149955 | 3325259 | 4.509573539985908
si_american_art_museum | 22 | 11570 | 0.19014693171996544
si_anacostia_museum | 322 | 571 | 56.3922942206655
si_libraries | 0 | 55 | 0.0
si_cooper_hewitt_museum | 35433 | 65686 | 53.94300155284231
si_gardens | 669 | 689 | 97.09724238026125
si_postal_museum | 2900 | 2951 | 98.27177228058285
si_portrait_gallery | 65 | 12001 | 0.5416215315390384
si_american_history_museum | 254 | 2290 | 11.091703056768559
si_african_american_history_museum | 680 | 7544 | 9.013785790031813
si_freer_gallery_of_art | 2932 | 3877 | 75.62548362135672
si_air_and_space_museum | 255 | 2501 | 10.195921631347462
si_african_art_museum | 3 | 136 | 2.2058823529411766
si_american_indian_museum | 168 | 248 | 67.74193548387096
si_hirshhorn_museum | 1 | 423 | 0.2364066193853428
Sub provider | No Description | Total Images | Missing Percentage
si_national_museum_of_natural_history | 3224038 | 3325259 | 96.95599651034702
si_american_art_museum | 9767 | 11570 | 84.41659464131374
si_anacostia_museum | 501 | 571 | 87.74080560420315
si_libraries | 0 | 55 | 0.0
si_cooper_hewitt_museum | 4168 | 65686 | 6.345339950674421
si_gardens | 0 | 689 | 0.0
si_postal_museum | 2 | 2951 | 0.06777363605557438
si_portrait_gallery | 11466 | 12001 | 95.54203816348638
si_american_history_museum | 215 | 2290 | 9.388646288209607
si_african_american_history_museum | 0 | 7544 | 0.0
si_freer_gallery_of_art | 3877 | 3877 | 100.0
si_air_and_space_museum | 319 | 2501 | 12.754898040783686
si_african_art_museum | 1 | 136 | 0.7352941176470589
si_american_indian_museum | 248 | 248 | 100.0
si_hirshhorn_museum | 423 | 423 | 100.0
The improvements for creator values are-
The improvements for description values are-
Fixes
Related to #397 by @annatuma
Description
With this fix, we reduce the number of missing creators and descriptions for Smithsonian data
Technical details
The content of the different fields available at the
freetext -> name
path were analysed to decide which of those could be used to retrieve the creator value from. TheCREATOR_TYPES
dictionary (contained in the Smithsonian script) was likewise extended with those selected fields such that the completeness of the creator field was improved for Smithsonian data. Unlike in the previous implementation, here we concatenate all 'creator' values with an equal importance (as indicated in theCREATOR_TYPES
dictionary) to obtain the full creator value to be stored.Similarly, the content of the different fields available at the
freetext -> notes
path were analysed to decide which of those could be used to retrieve the description value from. TheDESCRIPTION_TYPES
set (contained in the Smithsonian script) was likewise extended with those selected fields such that the completeness of the descriptions (within mete data field) was improved for Smithsonian dataChecklist
- [x] My pull request has a descriptive title (not a vague title like `Update index.md`). - [x] My pull request targets the *default* branch of the repository (`main` or `master`). - [x] My commit messages follow [best practices][best_practices]. - [x] My code follows the established code style of the repository. - [ ] ~~I added tests for the changes I made (if applicable).~~ - [ ] ~~I added or updated documentation (if applicable).~~ - [x] I tried running the project locally and verified that there are no visible errors. [best_practices]:https://gist.github.com/robertpainsi/b632364184e70900af4ab688decf6f53 ## Developer Certificate of OriginDeveloper Certificate of Origin
``` Developer Certificate of Origin Version 1.1 Copyright (C) 2004, 2006 The Linux Foundation and its contributors. 1 Letterman Drive Suite D4700 San Francisco, CA, 94129 Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Developer's Certificate of Origin 1.1 By making a contribution to this project, I certify that: (a) The contribution was created in whole or in part by me and I have the right to submit it under the open source license indicated in the file; or (b) The contribution is based upon previous work that, to the best of my knowledge, is covered under an appropriate open source license and I have the right under that license to submit that work with modifications, whether created in whole or in part by me, under the same open source license (unless I am permitted to submit under a different license), as indicated in the file; or (c) The contribution was provided directly to me by some other person who certified (a), (b) or (c) and I have not modified it. (d) I understand and agree that this project and the contribution are public and that a record of the contribution (including all personal information I submit with it, including my sign-off) is maintained indefinitely and may be redistributed consistent with this project or the open source license(s) involved. ```