Chocobozzz / PeerTube

ActivityPub-federated video streaming platform using P2P directly in your web browser
https://joinpeertube.org/
GNU Affero General Public License v3.0
12.8k stars 1.46k forks source link

Google Search Indexing - Video is not the main content of the page #6210

Open DVDGuy99 opened 5 months ago

DVDGuy99 commented 5 months ago

Describe the current behavior

Google Search Console gives the error "Video is not the main content of the page" when indexing videos on our PeerTube site.

This is one of the video pages that Google says the video is not the main content:

https://trailers.ddigest.com/w/sbKiNjKTCs8EkNpKb45ku9

(actually, I think it will say that for all the video pages - diving deeper into the video page indexing data, it says « Video is supplementary content on the page »)

Possibly related, but viewing a screenshot of the page generated within Google Search Console shows it displaying a "HLS.js does not seem to be supported" error where the video should be. Doing an exact term search for this on the videos section of Google shows quite a few PeerTube-hosted videos that have this as the crawled text description for the video.

Steps to reproduce

  1. Log in to Google Search Console account for the instance
  2. Under "Indexing" go to "Video Pages"
  3. A list of pages with the "Video is not the main content of the page" error should be listed here

Describe the expected behavior

As these pages are the main video playback pages, the video should of course be the main content of the page and Google should index these as such. Pages with videos that are indexed as the main content will show the video carousel with the video thumbnail as opposed to just a text link.

Additional information

Chocobozzz commented 5 months ago

Google bot seems to fail to load the HLS player (which is a non-sense). Trying to fallback to raw HTML element using https://github.com/Chocobozzz/PeerTube/commit/c4a062109d562cbe505c17044dd0b569a92ea121

Hope it will fix the issue (have to wait deploy on peertube2.cpy.re and re-schedule a google bot indexation)

Chocobozzz commented 5 months ago

Seems like to fix the issue :+1:

aflamrip commented 4 months ago

Has the problem been solved or is the same problem still present? Indexing-pages-with-videos-URL-inspection 1g

Chocobozzz commented 4 months ago

Has the problem been solved or is the same problem still present?

Should be fixed in next peertube release (6.1.0)

aflamrip commented 4 months ago

I think I found a solution but I don't know if it is right or wrong On this path peertube-latest\client\dist\standalone\videos\ There is a file embed.html This part is modified

to

But I don't know if this method will solve the problem Video is not the main content of the page

Video placement  Video is supplementary content on the page

Whether the page is a playback page for a single video (Video is main content on the page), or hosts additional meaningful content or videos (Video is supplementary content on the page).

DVDGuy99 commented 2 months ago

This issue seems to be still present in 6.1.0.

I've tried enabling/disabling web video, HLS with P2P support, and it doesn't seem to matter too much, as it still gives the "video is not the main content of the page" error:

google_search_console_not_main_content

Below is the JavaScript console error messages as shown in Google Search Console for a sample page (https://trailers.ddigest.com/w/1ZcXuBacku4tZeY7KPHwPF), including it in case it helps:

google_search_console_errors
DVDGuy99 commented 2 months ago

Here's the video page indexing report for another video with web video enabled:

google_search_console_not_main_content2
Chocobozzz commented 2 months ago

It's a nonsense, sometimes Google considers the video is not the main content on the page, and a few days later it correctly indexes the video. I'll look into it again, but if anyone here has a any clue, here don't hesitate to share it

DVDGuy99 commented 2 months ago

This thread might shed some light, and I think there's a really stupid fix for all of this involving adding the word "video" to the URL:

https://support.google.com/webmasters/thread/247936417/how-to-fix-video-is-not-the-main-content-of-the-page?hl=en

There are only 4 videos on my site that have been indexed and the URL that is indexed is like this:

https://trailers.ddigest.com/videos/watch/218beda6-427d-4ba5-83ad-d815cd13fbc6

Whereas all the ones not indexed is like this:

https://trailers.ddigest.com/w/jtdUAPbo65bNgz4Momxmm4

I wonder if a separate Google sitemap can be created that uses the "video/watch" URL structure as opposed to the "w/" one. For now, Google doesn't seem to care if the first one redirects to the second one.

DVDGuy99 commented 1 month ago

I've set up a cron job to create a version of the sitemap to be a workaround for this issue. The script basically replaces "https://trailers.ddigest.com/w/" with "https://trailers.ddigest.com/videos/watch/" in the sitemap, and then replaced the submitted sitemap in Google Search Console with this newly edited sitemap. This seems to work and videos are now being indexed, even though it shouldn't (as I'm submitting pages with redirects):

Screenshot 2024-06-01 141625
Chocobozzz commented 1 month ago

@DVDGuy99 Coming to the news: does google index all your videos with the new /videos/watch now?

DVDGuy99 commented 1 month ago

@Chocobozzz Yes, pretty much. It doesn't seem to re-add the videos that have already been indexed, even if they've been submitted via the sitemap. I'll try to force the reindexing (via the request indexing feature in Google Search Consoles) on a couple of older ones to see if they are also added/re-added.

DVDGuy99 commented 1 month ago

The older videos I've requested reindexing for have also been indexed as videos (moved out of the "Video is not the main content of the page" category), as have all the new videos that are included in my modified sitemap.

Chocobozzz commented 1 month ago

Unfortunately it doesn't work on my side, indexing https://framatube.org/videos/watch/gW6BUFLNSDWWZwUzZBXLoN instead of https://framatube.org/w/gW6BUFLNSDWWZwUzZBXLoN is refused by google because of the redirection. I'm surprised it works on your instance :thinking:

DVDGuy99 commented 1 month ago

It's definitely a weird situation with Google at the moment, and I'm thinking it has to be a bug or something. I have several videos where, as you said, the page won't get indexed because it's a redirect, but the video (with the "/videos/watch/" URL) does get indexed. So it seems that video pages only get indexed if it has "video" in the URL even if it's a redirect. I'm going to submit the "/w/" version of the URL for these pages and see what happens - maybe the page gets indexed but the video indexing is removed due to the "not the main content" error.

DVDGuy99 commented 3 weeks ago

So I managed to get both versions of the page indexed by Google. Don't ask me how it works, it's not supposed to, but it does for me.

Screenshot 2024-07-06 160014 Screenshot 2024-07-06 155950