The bulk of the problems described in #438 are due to the fact that we were not double-checking the media types of objects from Wikimedia Commons when retrieving their metadata. This PR changes the request we make to get that information (the mediatype), and uses the info to decide whether or not to store metadata about that object.
There are also two minor clean up changes included:
Moved logging initialization so that it's avoided when wikimedia_commons is imported instead of run as a script.
Changed script to use new total_images property of the ImageStore class for easier understanding.
We cannot call #438 solved until we have:
Cleaned the DB after this PR is merged and deployed
Come up with a more robust, general way to try to keep non-image objects' metadata out of the image table.
Technical details
Tests
There are new tests covering the functionality.
Also, the reviewer may (should they so desire) use the README to set up the development environment, and run
You shouldn't see any non-image objects in the local PostgreSQL after running that, but if you run the same from master, this will put metadata about a number of .ogg files (audio) in the image table of the local PostgreSQL DB.
Checklist
- [X] My pull request has a descriptive title (not a vague title like `Update
index.md`).
- [X] My pull request targets the `master` branch of the repository.
- [X] My commit messages follow [best practices][best_practices].
- [X] My code follows the established code style of the repository.
- [X] I added tests for the changes I made (if applicable).
- [ ] ~I added or updated documentation (if applicable).~
- [X] I tried running the project locally and verified that there are no
visible errors.
[best_practices]:https://gist.github.com/robertpainsi/b632364184e70900af4ab688decf6f53
## Developer Certificate of Origin
Developer Certificate of Origin
```
Developer Certificate of Origin
Version 1.1
Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129
Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.
Developer's Certificate of Origin 1.1
By making a contribution to this project, I certify that:
(a) The contribution was created in whole or in part by me and I
have the right to submit it under the open source license
indicated in the file; or
(b) The contribution is based upon previous work that, to the best
of my knowledge, is covered under an appropriate open source
license and I have the right under that license to submit that
work with modifications, whether created in whole or in part
by me, under the same open source license (unless I am
permitted to submit under a different license), as indicated
in the file; or
(c) The contribution was provided directly to me by some other
person who certified (a), (b) or (c) and I have not modified
it.
(d) I understand and agree that this project and the contribution
are public and that a record of the contribution (including all
personal information I submit with it, including my sign-off) is
maintained indefinitely and may be redistributed consistent with
this project or the open source license(s) involved.
```
Fixes
Related To #438 by @aldenstpage
Description
The bulk of the problems described in #438 are due to the fact that we were not double-checking the media types of objects from Wikimedia Commons when retrieving their metadata. This PR changes the request we make to get that information (the
mediatype
), and uses the info to decide whether or not to store metadata about that object.There are also two minor clean up changes included:
wikimedia_commons
is imported instead of run as a script.total_images
property of theImageStore
class for easier understanding.We cannot call #438 solved until we have:
Technical details
Tests
There are new tests covering the functionality.
Also, the reviewer may (should they so desire) use the README to set up the development environment, and run
You shouldn't see any non-image objects in the local PostgreSQL after running that, but if you run the same from master, this will put metadata about a number of
.ogg
files (audio) in theimage
table of the local PostgreSQL DB.Checklist
- [X] My pull request has a descriptive title (not a vague title like `Update index.md`). - [X] My pull request targets the `master` branch of the repository. - [X] My commit messages follow [best practices][best_practices]. - [X] My code follows the established code style of the repository. - [X] I added tests for the changes I made (if applicable). - [ ] ~I added or updated documentation (if applicable).~ - [X] I tried running the project locally and verified that there are no visible errors. [best_practices]:https://gist.github.com/robertpainsi/b632364184e70900af4ab688decf6f53 ## Developer Certificate of OriginDeveloper Certificate of Origin
``` Developer Certificate of Origin Version 1.1 Copyright (C) 2004, 2006 The Linux Foundation and its contributors. 1 Letterman Drive Suite D4700 San Francisco, CA, 94129 Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Developer's Certificate of Origin 1.1 By making a contribution to this project, I certify that: (a) The contribution was created in whole or in part by me and I have the right to submit it under the open source license indicated in the file; or (b) The contribution is based upon previous work that, to the best of my knowledge, is covered under an appropriate open source license and I have the right under that license to submit that work with modifications, whether created in whole or in part by me, under the same open source license (unless I am permitted to submit under a different license), as indicated in the file; or (c) The contribution was provided directly to me by some other person who certified (a), (b) or (c) and I have not modified it. (d) I understand and agree that this project and the contribution are public and that a record of the contribution (including all personal information I submit with it, including my sign-off) is maintained indefinitely and may be redistributed consistent with this project or the open source license(s) involved. ```