WWBN / AVideo

Create Your Own Broadcast Network With AVideo Platform Open-Source. OAVP OVP
https://avideo.tube/AVideo_OpenSource
Other
1.92k stars 973 forks source link

Preventing robots from crawling specific part of a page #3495

Closed akhilleusuggo closed 4 years ago

akhilleusuggo commented 4 years ago

Hello Daniel ;

I wrote to you a while ago that google is not crawling correctly the AVideo , and that is because the html structure is open . When google crawl a page , does crawl all of it , including the videos that you see here : ( videosListRow )

image

This bring confusion to google , since doesn't really know what to show . May show any thumbnail , and any tile with miss leading URL . We can fix that by adding or Disallow: /iframes/ Here is more information , I just don't know where exactly this should be added . This is very important , because this affects the SEO very hard , when people click on the videos , and the videos redirect them to something else .

I hope you know where it should be added .

link : https://webmasters.stackexchange.com/questions/16390/preventing-robots-from-crawling-specific-part-of-a-page https://stackoverflow.com/questions/15685205/noindex-tag-for-google/15718255

Both are basically the same

DanielnetoDotCom commented 4 years ago

Hi,

recently I fix some problems regarding images on crawlers due the images small size. now we use the main poster image as a thumbs <meta property="og:image" and we force minimum 200px what may be enough for the crawlers engines to accept as a default image.

before our thumbs were smaller than that, so the crawlers catch a random image.

Are you still having recent issues regarding this?

akhilleusuggo commented 4 years ago

I'm not talking only about that . The crawlers does crawl the wrong URL / Titles ...

I just wanna add a non index to the videosListRow ( completely )

On the HTML , for this URL : https://demo.avideo.com/video/36/stronger-hillsong-chapel?channelName=HillsongUNITED , the HTML would correspond here :

image

That video list SHOULD NOT be crawled .

All I'm asking is to tell me , where to change it on the code . If you believe this is wrong , just let me give it a try by my self . But I've been inspecting pages on google search , and every page a inspect , does crawl over 10 titles and 10 urls .

When inspecting your page gives 2 results :

image image

From : image

image

akhilleusuggo commented 4 years ago

@DanielnetoDotCom Could you provide some info ? What part of the script should I delete/remove add the no-index ?

DanielnetoDotCom commented 4 years ago

I am not sure how the no-index tag looks like, I made some research and I just found a way to no-index a whole page.

but if you have a way to tag it on the videos list section, it should be done here https://github.com/WWBN/AVideo/blob/0dc8afad5b54541f7626fc75eb33cda667e151d8/view/modeYoutube.php#L546

akhilleusuggo commented 4 years ago

I've tried that one , but is not working . That videoList should be tagged as iframe javascript , and added to robot.txt to be ignored by google/yandex/yahoo

DanielnetoDotCom commented 4 years ago

iframe? I do not get it!

should be some way to tag it and make google ignore it.

DanielnetoDotCom commented 4 years ago

Or maybe if the site detects it is a bot, do not load the videos list

https://github.com/WWBN/AVideo/blob/009a4aaac17281b0c0afae6403dd4b9fc6d6d155/objects/functions.php#L2038

akhilleusuggo commented 4 years ago

should be some way to tag it and make google ignore it.

What I mean , is that videosList should be into an iframe , like the comments . Hosted not into the same page as the video's page itself . This way , we can add the path of the iframe to the robot.txt

And btw , we have an option ( witch I don't know if it's working or not ) , but it's for comments .

image

If it's working , we could do the same for the videoList

DanielnetoDotCom commented 4 years ago

The comments use this https://stackoverflow.com/questions/14314111/avoid-crawling-part-of-a-page-with-googleoff-and-googleon

we can do the same for the videosList

DanielnetoDotCom commented 4 years ago

Nevermind, I just saw this does not stop Google's web-search at all.

DanielnetoDotCom commented 4 years ago

BTW, I disagree about use iframes for the videosList. but we can stop it if it is a bot I am sending an update with the bot detection

akhilleusuggo commented 4 years ago

the googleoff is really old stuff , and not working anymore . It's used ONLY for commercial Google search appliance ( something like a private google search for companies )

The iframe is the only solution I've found .

akhilleusuggo commented 4 years ago

I've tested the new update , but no results :

https://search.google.com/test/rich-results You can test it , you will always get 2 or more videos detected on the same page .

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.