Open rheaplex opened 8 years ago
in wp-admin/includes/image.php :
wp_read_image_metadata()
calls:
wp_kses_post_deep()
which strips tags.
So we have to use the php exif parsing. I'll look at hooking this in for the Media editor, pulling the values for the fields if they are not otherwise populated.
@mattl does this indicate a more general problem with the Exif tag format we are proposing? I don't believe so, but worth considering. Also maybe we should consider adding source and CC+ if we haven't already.
Making progress with the php Exif parsing, just trying to make it efficient for the code and logical for the user.
Code now extracts license and attribution url when you view the media. Looking to see if I can hook this in to the image upload process, but if not this will be Good Enough, I think.
Metadata now extracted on image upload.
This won't get metadata for existing images if the plugin is installed and we have (e.g.) 20,000 images with Exif already in the system.
@mattl we can run the extract code when you view the image in the Media editor, or is this something we might want to give the user the option of running manually from the settings for the plugin (a button [Scan Existing Images for License Metadata And Apply It] ) if that's possible?
Won't existing images have been previously stripped by WordPress?
I don't believe so. The strings are stripped after reading from the file, rather than the file itself being sanitised.
Maybe something like this? We could pull all the existing images from the CC website as a test, but also @ericsteuer has good insight into how this works on a big site liked Wired.com who probably have a few hundred thousand images.
I had in mind more a global "Extract CC License metadata where present but don't overwrite anything" option.
We could also add a button to the media manager to do this for individual images.
So the former would support hundreds of thousands, the latter just a few if you only want to use a few.
On Fri, Jul 29, 2016 at 1:57 PM, Matt Lee notifications@github.com wrote:
[image: screenshot from 2016-07-29 15-56-15] https://cloud.githubusercontent.com/assets/33296/17263143/0538b4a8-55a5-11e6-9471-e62fa5f2e11b.png
Maybe something like this? We could pull all the existing images from the CC website as a test, but also @ericsteuer https://github.com/ericsteuer has good insight into how this works on a big site liked Wired.com who probably have a few hundred thousand images.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/creativecommons/wordpress-plugin/issues/14#issuecomment-236291472, or mute the thread https://github.com/notifications/unsubscribe-auth/AABU8ocliyFl3zU0pVeCqom5cyrKPavHks5qamkxgaJpZM4JWuJI .
The worry I have there is that we'd wind up adding extra captions to existing images all over the place.
Sure. It's the sort of thing where the user will want the plugin to do the right thing, for a value of "the right thing" that will differ from case to case. And they'll really want an Undo button.
So if this is too difficult to do usefully we shouldn't make something that will just frustrate people. :-)
Why not use the 'regenerate thumbnails approach' in which you have a plugin run once for all existing images? This could be a seperate add-on plugin which can be removed after it has run, since it's likely to be run only once.
If we have a jpeg with a Copyright field like:
then when we upload the file to WordPress and fetch the Exif metadata using:
then the string we get for Copyright is:
I assume this is due to WordPress taking the sensible precaution of stripping HTML tags from outside input, but it does mean that the format we are using for license URLs falls foul of this.
I've chased this down the call stack a way and I can't find anywhere to change it. I'd rather not have to use php's exif parsing, although I've just tested that and it doesn't have the same problem.
Investigating further, but if anyone knows of a quick fix for this please let me know.