Open danielbachhuber opened 9 years ago
To be fair, the tool and image scraping is already in core and has been for quite some time, but I do share some of your concern. This version definitely highlights the feature a bit more prominently.
Press This in core doesn't automatically sideload images though?
This is what it does:
Crazy. I've never seen that before. Fancy that.
Every single social network out there takes the images provided in the meta tags and displays those images on their website with a summary and the ability to add text with links back to the original article. It's called Open Graph and it exists specifically for the purpose of what Press This provides.
Facebook does it with every link you paste in your stream.. It grabs the Open Graph data, displays the image embedded, and allows you to add or write whatever text you want. So copyright is not a concern. If you were to download the image to your site then display it on your site with no link back, then that would be a problem. (Although, that's exactly what Pinterest does, but we won't get into that discussion)
Press This does not download the images, it only does what the the Open Graph protocol provides for.
@chazzzzy : core Press This does and has always sideloaded images upon save, to keep a local copy if the original disappears:
@danielbachhuber: never seen that before: that explains some of your questions. It's what PT was built for though, like the Tumblr bookmarklet at the time, or the Pinterest one more recently.
As chazzzzy mentioned, the practice is long-standing and common online, as well as with the existing PT. I would venture it's fair use, but I'm no legal expert.
We also (now) make a greater effort to use values the sites have clearly defined and specified as being what they want their articles and content to be represented as when shared elsewhere, by detecting Open Graph and Twitter Cards tags, etc. This includes representations for thumbnails, embeds, etc.
I would venture it's fair use, but I'm no legal expert.
Why do you say that?
what they want their articles and content to be represented as when shared elsewhere, by detecting Open Graph and Twitter Cards tags, etc.
Right, but in the case of Open Graph and Twitter Cards, the content is always hosted on the source's domain, and they have control over removing it.
The fact that you are updating Press This is one of the most exciting and relevant updates to Wordpress in a long time because Sharing is what the Internet is all about right now. Wordpress was always behind in this area (or appeared to be to users since most never knew what Press This was.)
We will be using Press This to mimic all the of the social networks, and that is the ability to share articles found on the web, automatically grab a summary with an image and with an automatic link back to the source. Our users will be commenting on the articles or referencing them for discussion.
Fair use states that if you are talking about something you are able to show it. Press This pulls information from the source that helps with the process. It doesn't pull the whole article, but rather the summary. It provides a link to the source and it provides a simple way to comment on the article. It provides an image from the source which is cached if the source image disappears, which from a technical and resource and historical sense, makes sense..
When you "share an article" on the web, this same process is repeated on all the major social networks. A summary of text is pulled as well as a thumbnail. If the source article disappears, the TEXT does not disappear from around the web and that could theoretically also be considered copyrighted. So the fact that the image is also cached for historical purposes, would seem to fall under the same rules as the text.
The actual copyright of the original image is a whole other matter (did the original site have permission, etc..) but that is not Wordpress as a platform's problem to worry about
So pease don't change the functionality. It's bringing Wordpress up to par with everyone else.
I would not consider this to be a relevant concern to this project or WordPress. It's every author's own responsibility to ensure they aren't infringing copyrights in content they publish. End of story.
@DrewAPicture I wasn't aware that you had a legal background.
@danielbachhuber Feel free to get a legal opinion. I would argue that if it hasn't been a problem up to now that it probably wouldn't be going forward.
Feel free to get a legal opinion.
Hence how I opened the issue:
Maybe Paul S at Automattic can weigh in?
He has. I talked to him. Press this is fine.
Cool. Then the one suggestion I'd have is that we include the photo credit if we can easily find one.
Good suggestion. Can look into it (albeit later). Challenge is that there's no structured data format for those, and the format is left to each publication's own style guide.
Yeah, there's no real standard for authorship of images that can easily be respected. That said most in-the-wild uses of Open Graph and Twitter Cards usually cache the image locally, so while, in theory, the original site has the ability to pull it down, they usually don't actually have that ability.
If we wanted to try and pull a caption for authorship, or at least make a fair attempt, we could scan the near DOM for a caption and attempt to duplicate the caption around the new side-loaded image, there's no standard caption methodology in common use either.
We could take and read the EXIF data, as many photographers put that in before releasing professional photos, but then again I'd assume that data would be brought in with the image.
Alternatively, we can auto-caption any added image with the source URL and attempt to include the source author so something like: "This image was included on [title] by [author]." Though of course, there isn't really consistent use of any author mechanism, even the rel=author HTML5 property is not widely used, and when it is, it is often used incorrectly.
Facebook derives authorship from the not-so-widely used META author tag ( <meta name="author" content="Jack Healy">
) , so perhaps an attempt to read that in as part of a statement of the source of the overall post would be useful.
I think it is one of those things that are probably technically illegal for the author to do (duplicate an image without credit) but not on the platform (say WordPress.com). I looked in to this quite a bit a while back to build out a training unit for using images while writing posts
Finally, there may be another option we would want to author those individuals who are conscientious about fair use of a particular image. Common legal experts consensus on fair use of digital images is that size matters, so offering smaller thumbnail options (or perhaps allowing a site or user option to always prefer the display of smaller thumbnails) would be legally fair use.
"Kelly v. Arriba Soft Corporation (2002) held that low-resolution thumbnails were non-infringing" http://en.wikipedia.org/wiki/Wikipedia:Fair_use/Definition_of_"low_resolution"#cite_note-1
If we wanted to offer that option, further legal consensus seems to state that "125x100 pixels for landscape-sized images and 100x125 pixels for portrait-sized images should be used." ([PDF] - https://cms.bsu.edu/-/media/WWW/DepartmentalContent/Library/Copyright/CopyrightForum/v3i2.pdf ). It might be nice to offer users that option, as even if they are writing new content around a quote or image the use might not be transformative.
Of course, the honest truth is that most platforms that aggregate such images (say Pinboard, or Twitter, for example) really just don't care, so ... shrug
Interesting read about this very issue (so meta). :)
http://wpandlegalstuff.com/press-this-and-copyright-infringement/
Right, but in the case of Open Graph and Twitter Cards, the content is always hosted on the source's domain, and they have control over removing it.
FYI, Facebook caches images from Open Graph in a big way. It's actually kind of annoying sometimes.
Still, you can delete the image and clear cache because Facebook is a centralized network
I'm not the best person to comment on this, but it seems like building a tool that automatically scrapes copyrighted materials should have an upfront discussion about said legal implications, and whether this is something we should promote.
Maybe Paul S at Automattic can weigh in?