Yoast / wordpress-seo

Yoast SEO for WordPress
https://yoast.com/wordpress/plugins/seo/
Other
1.78k stars 894 forks source link

wpseo_pre_analysis_post_content does not affect XML sitemap #2277

Closed dglingren closed 8 years ago

dglingren commented 9 years ago

I am the author of Media Library Assistant (https://wordpress.org/plugins/media-library-assistant/), a wordpress plugin that includes an alternative to the [gallery] shortcode. I am working on a response to a support question, "MLA galleries and SEO " (https://wordpress.org/support/topic/mla-galleries-and-seo).

I have implemented the "wpseo_pre_analysis_post_content" filter in my plugin and it is working fine in the "Page Analysis" meta box tab on the Edit Media screen. However, the filter does NOT affect the content of the XML sitemaps generated for the site. In fact, I don't see any apply_filter calls in the WPSEO_Sitemaps class implemented in class-sitemaps.php.

Can you tell me if this is a bug, or is there come other way to get the output of my plugin's shortcode added to the XML sitemap analysis? Thanks for your help.

dglingren commented 9 years ago

I just re-tested this issue on Yoast SEO v2.3.5 and there has been no change. Can you tell me if there is any progress on adding shortcode output to the XML sitemap analysis? Thanks for any help/information you can give me.

Rarst commented 9 years ago

I am currently working on major refactoring of sitemap code and have it scheduled to work through all the open issues while at it. :) Nothing specific to say about this one yet. See #2787

dglingren commented 9 years ago

Thanks for your reply. I ran across #2787 a few days back and thought it would be a good time to check in on this issue as well. I look forward to your progress. If there is anything I can do to help or test, let me know.

AskKim commented 8 years ago

This filter was removed and replaced by an API in our 3.0 version. See http://kb.yoast.com/article/91-can-i-add-data-to-the-page-analysis

dglingren commented 8 years ago

Thanks for your update. I had seen some references to the new 3.0 approach when I updated my copy of the plugin. The article you referenced says the old filter "allows plugin developers to add their custom fields content". That's fine but it does not address how to add shortcode output to the page analysis.

For example, the [gallery] shortcode generates a list of image tags, and the image files they reference should be added to the sitemap.

If my plugin supports a shortcode that appears on the page being analyzed, what is the appropriate way to add the output the shortcode would generate to the page analysis? Thanks for your help.

AskKim commented 8 years ago

I wish I knew but I'll have to leave that answer to the dev team themselves. However, once we have an answer, I'll do my best to see that it gets into the updated KB article.

Rarst commented 8 years ago

Could you elaborate a bit on what you need to accomplish with sitemap precisely? Are you trying to add images to a sitemap entry? Well, there isn't much that can be in sitemap entry at all. :)

dglingren commented 8 years ago

Thanks to both of you for your prompt responses.

@Rarst,

Here is some additional background/context for you. I opened this topic in response to a support request I received for my plugin. Here's a link to that topic:

MLA galleries and SEO

The original request said:

I have been wondering how a page that displays a gallery using MLA, and whose text is all based on IPTC metadata descriptions, titles and captions, works with search engines.

Do search engines see the page as having content when all the text comes from the image metadata in a dynamically displayed gallery and is not part of the hard coded html of the stored page?

My response stated:

The easy thing to do would be to simply replace the [mla_gallery] shortcode text with the output HTML generated by running the shortcode. Would that be sufficient? If not, what part of the following data would be required:

  1. Standard fields (Title, Name/Slug, Caption/Excerpt, ALT Text, Description/Content)
  2. Assigned terms (e.g., categories and tags)
  3. Custom fields
  4. IPTC/EXIF/XMP metadata present in the items but not mapped to WordPress elements

And the reply was:

Regarding the output, the HTML generated by the shortcode is perfect as it would have the image tags and any text in the page that was the result of the shortcodes. That way, any content from fields accessed by the shortcode would be included.

To summarize, my plugin produces image galleries and "term clouds". The galleries have thumbnail images that include links to other pages on the site or to image files. The galleries also have caption and description text that includes keywords and other content that should be indexed just like other text on the page.

What I want to accomplish, then, is to get the shortcode output included in the page analysis. I hope that gives you what you need to give me some additional guidance on how to proceed. Thanks for your help with this.

dglingren commented 8 years ago

I have just received an update from my user, who writes:

I am pretty sure that the pages are getting indexed correctly by Google. The issue is with sitemaps. Here is what I would add:

We are trying to have the images displayed in MLA galleries included in the Yoast XML sitemap. Currently, pages that have MLA galleries are listed in the XML sitemap as having 0 images.

Rarst commented 8 years ago

So essentially there are two spearate issues here — text analysis and sitemap.

For text analysis the new and shiny JS implementation was just released, so old hooks are now irrelevant. This is not my area, but I checked in and it should pick up shortcodes. Though if you are doing something complicated you might need to implement that explicitly. See https://github.com/Yoast/YoastSEO.js

For sitemaps going through content to add images is kind of roundabout. If you can retrieve images relevant for the post, you can just filter sitemap entry. wpseo_sitemap_entry or something around there and splice your images into $url['images'].

This is a little curt explanation, since anything more extensive would take me going through all the relevant code and writing examples, but should give you some starting points. :)

dglingren commented 8 years ago

@Rarst,

Thank you for your update and the suggestion to use the wpseo_sitemap_entry filter to add image information to the XML sitemap. I was able to find that filter and the related wpseo_sitemap_urlimages filter and use them to accomplish my goal. To "retrieve images relevant for the post" I extract and execute the shortcodes I need and parse the results to find the 'src' and 'alt' entries for the sitemap.

As I was investigating and testing I noticed that the default processing done in class-sitemaps.php just above the two filters does not work as I would expect. First, lines 801-804 scan the post content for embedded <img ... tags and adds their information to the array. Then, lines 806-811 look for a gallery shortcode. If one is found, the results of the previous analysis are discarded and replaced by the list of images attached to the post/page. There are two issues with this:

My problem is solved, but you should review the default processing in light of my findings. Thanks once again for your help with this issue.

Rarst commented 8 years ago

I opened new issue for that, see #3526 and thanks for feedback. :)