buzzn / core

Die Kraft der Gemeinschaft
GNU Affero General Public License v3.0
1 stars 0 forks source link

google bot on charts pages #775

Closed mkristian closed 6 years ago

mkristian commented 7 years ago
httpResponseCode    422
request.headers.accept  */*,application/json
request.headers.host    app.buzzn.net
request.headers.referer https://app.buzzn.net/groups/wagnis4
request.headers.userAgent   Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
request.method  GET
response.headers.contentLength  130
response.headers.contentType    application/json
url /api/v1/aggregates/past

either use robot.txt to define what bots can crawl and index on app.buzzn.net or use http headers to tell them what links to follow or not, I think even the header itself can say no-index, etc but that is too late as the page got already loaded.

to be open for google & CO is just extra traffic and they do store those data as well !

IMO just use: public/robot.txt

User-agent: *
Disallow: /
mkristian commented 7 years ago

there is more: all pages which have no route will produce a 500 pages with

500 Internal Server Error
If you are the administrator of this website, then please read this web application's log file and/or the web server's log file to find out what went wrong.

that is bad - I make another issue for this.

so there is no robots.txt for the bots. there is sitemap gem which runs something once a day via a rake tasks. need to dig into the gem to find out.

mkristian commented 7 years ago

this sitemap_generator does some default thingy. what exactly is totally unclear and from their readme the outcome is host specific, i.e. if we use this gem what is the expected outcome ? how to test it ?

mkristian commented 7 years ago

as I have no idea what suppose to be right here I move it back to redy.

dottorer commented 7 years ago

Why would we make the app visible to robots / crawlers at all? They should stay out as I see no value.

ffaerber commented 7 years ago

group readbale_by world

mkristian commented 7 years ago

IMO robots/bots/crawler should stay out at least for the time being as any other approach with filtering out those bubbles and charts is not worth the effort.

mkristian commented 7 years ago

if #776 makes the bots/crawlers to stay away we should look into what exactly this sitemap_generator is doing - either rename this issue or make a new after #776 is done

mkristian commented 7 years ago

can this be related to this error seen on newrelics:

ActionView::MissingTemplate: Missing template var/www/buzzn/releases/aca2931605fd54fd88d4fbb8f0427a8b6003073c/public/403.html with {:locale=>[:de], :formats=>[:html, :text, :js, :css, :ics, :csv, :vcf, :png, :jpeg, :gif, :bmp, :tiff, :mpeg, :xml, :rss, :atom, :yaml, :multipart_form, :url_encoded_form, :json, :pdf, :zip], :variants=>[], :handlers=>[:erb, :builder, :raw, :ruby, :coffee, :haml, :jbuilder]}. Searched in: "/var/www/buzzn/releases/aca2931605fd54fd88d4fbb8f0427a8b6003073c/app/views" "/var/www/buzzn/shared/vendor_bundle/ruby/2.3.0/gems/cookie_alert-0.0.5/app/views" "/var/www/buzzn/shared/vendor_bundle/ruby/2.3.0/gems/doorkeeper-3.1.0/app/views" "/var/www/buzzn/shared/vendor_bundle/ruby/2.3.0/bundler/gems/grape-swagger-rails-94aecb2aa10c/app/views" "/var/www/buzzn/shared/vendor_bundle/ruby/2.3.0/gems/devise-i18n-views-0.3.7/app/views" "/var/www/buzzn/shared/vendor_bundle/ruby/2.3.0/gems/devise-i18n-1.0.1/app/views" "/var/www/buzzn/shared/vendor_bundle/ruby/2.3.0/gems/devise_invitable-1.6.0/app/views" "/var/www/buzzn/shared/vendor_bundle/ruby/2.3.0/gems/devise-3.5.10/app/views" "/var/www/buzzn/releases/aca2931605fd54fd88d4fbb8f0427a8b6003073c" "/"

httpResponseCode 500 request.headers.accept / request.headers.host app.buzzn.net request.headers.userAgent Slackbot-LinkExpanding 1.0 (+https://api.slack.com/robots) request.method GET response.headers.contentType text/plain url /registers/a54c1cce-275b-4b10-834e-28014c7c8327