INN / umbrella-sfpublicpress

San Francisco Public Press
https://sfpublicpress.org/
GNU General Public License v2.0
1 stars 4 forks source link

Remove duplicate featured image in post content #63

Closed joshdarby closed 4 years ago

joshdarby commented 4 years ago

Some posts were imported with a featured image and an image in the post content that is a duplicate of the featured image.

Example: http://sfpublicpress.flywheelsites.com/propositions-g-and-h-defining-clean-or-green-energy/

screencapture-sfpublicpress-flywheelsites-propositions-g-and-h-defining-clean-or-green-energy-2020-03-24-10_47_49

We need a way to programmatically remove the duplicated image in the post content.

benlk commented 4 years ago

Largo has a featured media deduplicator for Classic Editor content, where if the first line in post_content is an image tag with the same src as the featured media, then that image tag is removed from the post_content when it's output on the post content:

https://github.com/INN/largo/blob/v0.6.4/inc/post-templates.php#L117-L219

That solution won't work here (and doesn't work in Gutenberg: https://github.com/INN/largo/issues/1834) because:

  1. the image to be hidden is not the image in the post_content, but instead the featured media
  2. the post_content image isn't necessarily the first line in post_content.

What might work is:

largo_hero() calls largo_get_hero(), which provides a handy filter: largo_get_hero

https://github.com/INN/largo/blob/512da701664b329f2f92244bbe54880a6e146431/inc/featured-media.php#L73-L82

Since we have the WP_Post object from get_post( $post ), we can write a filter that compares the post_content from that to the image URL from largo_get_featured_media( $post ): https://github.com/INN/largo/blob/512da701664b329f2f92244bbe54880a6e146431/inc/featured-media.php#L203

If there's a match, the function filtering largo_get_hero returns an empty string.

joshdarby commented 4 years ago

Since we have the WP_Post object from get_post( $post ), we can write a filter that compares the post_content from that to the image URL from largo_get_featured_media( $post ): https://github.com/INN/largo/blob/512da701664b329f2f92244bbe54880a6e146431/inc/featured-media.php#L203

If there's a match, the function filtering largo_get_hero returns an empty string.

That's along the lines of what I was thinking, except we need to reverse it. They want to keep the featured image and hide the one inside of the content.

My big concern/question is do we care about the image still appearing in the post content inside of the editor? Maybe we can use whatever we function we write to use a filter to remove it from the editor as well which would actually remove the duplicate image on save from the db.

benlk commented 4 years ago

I don't think we should remove it from the database, and I'm worried about hiding the image in the post content anyways.

There are way too many edge cases on how the image is presented in the post content, which we'd have to take into account. We can't simply remove the img tag; we need to account for:

If we remove the image from the database, rather than only removing it from the presentation layer, then we'll be removing the information that we would need to use to find the edge cases where the image was hidden but the corresponding image-related markup was not used.

If we want to do as SFPP asks and implement a general "remove the image from the post content if it matches the featured image, no matter where in the post content it is", we're going to have to implement something that solves this Largo issue: https://github.com/INN/largo/issues/1834

The reverse — hiding the hero image if the hero image matches the post content — is simpler to implement and has fewer ways it would go wrong.

Both solutions still fail if the featured image is the same image, visually, as the image in the post content, but has a different URL because of crops or resizing or re-uploads.

joshdarby commented 4 years ago

@MirandaEcho based on @benlk's concerns above, are we ok with doing the reverse of what was requested — hiding the hero image if a match exists in the content, as opposed to hiding the image in the content?

joshdarby commented 4 years ago

Both solutions still fail if the featured image is the same image, visually, as the image in the post content, but has a different URL because of crops or resizing or re-uploads.

We could grab the ID of the featured image, run get_attached_media to get all media in the post content and find any that have a matching ID.

https://developer.wordpress.org/reference/functions/get_attached_media/

benlk commented 4 years ago

Can we rely on the image in the post_content having a corresponding attachment post with post_parent metadata that matches the ID of the post where the image appears in the post_content?

joshdarby commented 4 years ago

I would like to think we can assume so since the way these images were added was automated as opposed to a someone manually placing them there.

benlk commented 4 years ago

Alright, then that approach sounds workable.

MirandaEcho commented 4 years ago

It sounds like we found a solution for this, but does knowing that the duplicate photo is ALWAYS at the bottom of the post now change anything for this?

benlk commented 4 years ago

The position within the post doesn't really change anything.


This afternoon, we decided to remove the images from the post.

Parts of this:

benlk commented 4 years ago

Continued in https://github.com/INN/umbrella-sfpublicpress/pull/87#issuecomment-642428792

benlk commented 4 years ago

posts for testing:

benlk commented 4 years ago

Capturing a decision made in Slack at https://innorg.slack.com/archives/GEVTM0XAQ/p1591899518047800:

There are posts like 2931 that contain plain ol' HTML img tags in post_content duplicating the post thumbnail image. These predate the work to move images from the Drupal sidebar area to the post_content, which added the sidebar images to post_content as Gutenberg Image Blocks.

Decided: The scope of this ticket solely regards the images migrated from the Drupal sidebar, which appear in post_content as Image Blocks.