emanuele45 / prettyurls

Automatically exported from code.google.com/p/prettyurls
Other
0 stars 0 forks source link

Make better url rewriting #90

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Open a thread then answer in
2. Watchs the links you use to show an unique post (for exemple, for this 
topic : http://www.simplemachines.org/community/index.php?topic=322736.0 i 
want to show u that post : http://www.simplemachines.org/community/
index.php?topic=322736.msg2149205#msg2149205)
3. With prettyurl, we'll get somethink like http://www.example.com/forum/
under_forum/name-of-the-thread/msg72774/#msg72774

Where is the problem ? It's the "/msg72774/". It indicates to google or 
other bots that http://www.example.com/forum/under_forum/name-of-the-
thread/msg72774/ is a page, and http://www.example.com/forum/under_forum/
name-of-the-thread/msg72775/ could be another page. But they aren't. And 
there references seems uselesses. 
I think google understand that, but does other understand ? And, does 
google scrawl all there useless links ? 
srv:~# host 66.249.71.13
13.71.249.66.in-addr.arpa domain name pointer 
crawl-66-249-71-13.googlebot.com.
srv:~# cat /var/log/apache2/example.com-access.log | grep "/msg" | grep 
66.249.71.13
66.249.71.13 - - [05/Jul/2009:17:01:35 +0200] "GET /forum/under_forum/name-
of-the-thread/msg14300/ HTTP/1.1" 200 13457
66.249.71.13 - - [05/Jul/2009:17:01:37 +0200] "GET /forum/under_forum/name-
of-the-thread/msg14307/ HTTP/1.1" 200 13455
66.249.71.13 - - [05/Jul/2009:17:03:22 +0200] "GET /forum/under_forum/name-
of-the-thread/msg14233/ HTTP/1.1" 200 13453

Answer is : yes it is. So it uses the googlebot for nothing, use 
ressources of the server for nothing, maybe decrease your pagerank (many 
pages to scrawl for same content, many time wasted, ...)

But the original SMF uses the same system. watch http://
www.simplemachines.org/community/index.php?
topic=322736.msg2149205#msg2149205 and http://www.simplemachines.org/
community/index.php?topic=322736.msg2150989#msg2150989
Is there any reason for that ? 
Maybe SMF have an option to show only the post id in the URL like 
vBulletin ? (ex : http://forum.ovh.net/showthread.php?t=29292 , http://
forum.ovh.net/showpost.php?p=139737&postcount=1, http://forum.ovh.net/
showpost.php?p=139749&postcount=2)

What do you think about ?

What version of the mod, and what version of SMF are you using?
1.0 on 1.1.9

Original issue reported on code.google.com by aurelga...@gmail.com on 11 Jul 2009 at 1:03

GoogleCodeExporter commented 8 years ago
Workaround using robots.txt : 

User-agent: *
Disallow: /forum/*/*/msg*/

Original comment by aurelga...@gmail.com on 11 Jul 2009 at 1:20

GoogleCodeExporter commented 8 years ago
SMF 2 has noindex meta tags no msg URLs, and RC2 will have support for 
canonical URLs
too. If this matters to you, upgrade to SMF 2. There's no point duplicating
functionality in a mod when it comes by default.

Original comment by curiousdannii on 12 Jul 2009 at 1:42