EndPointCorp / end-point-blog

End Point Dev blog
https://www.endpointdev.com/blog/
17 stars 66 forks source link

Comments for List Google Pages Indexed for SEO: Two Step How To #234

Open phinjensen opened 7 years ago

phinjensen commented 7 years ago

Comments for https://www.endpointdev.com/blog/2009/12/google-pages-indexed-seo/ By Steph Skardal

To enter a comment:

  1. Log in to GitHub
  2. Leave a comment on this issue.
phinjensen commented 7 years ago
original author: Shane M Hansen
date: 2009-12-14T11:55:20-05:00

I'd suggest using curl and bash's {} grouping operators to stream content to sed like this. You could also check out my post on poor man's concurrency with bash to run several of these processes at once.

{ for i in http://www.google.com http://www.backcountry.com;do curl $i;done; } | sed -e '/stuff/d'

phinjensen commented 7 years ago
original author: Steph Powell
date: 2009-12-14T12:22:26-05:00

Hi Shane,

Thanks for the suggestion. However, running:

{ for i in "http://www.google.com/search?num=100&as_sitesearch=www.endpoint.com"; do wget $i; done; } | ...

or

{ for i in "http://www.google.com/search?num=100&as_sitesearch=www.endpoint.com"; do curl $i; done; } | ...

triggers a 403 (forbidden) response, so it would require some hacking to get around forbidden script requests to Google. Perhaps I'll play around with the User Agent settings and find a way to successfully make curl requests in the future.

~Steph

phinjensen commented 7 years ago
original author: Shane M Hansen
date: 2009-12-14T12:47:49-05:00

Good point. Google seems to require the user agent string. Using something as simple as curl -A 'mozilla' 'http://www.google.com/search?num=100&as_sitesearch=www.endpoint.com'

seems to work. The linkscape api is a great tool for this sort of thing also.

phinjensen commented 7 years ago
original author: Steph Powell
date: 2009-12-14T13:01:45-05:00

Yes, I've been thinking about working with the Linkscape API. According to the API docs, from the free (limited) API, you can grab:

I would love to integrate the Linkscape API into my SEO workflow.

phinjensen commented 7 years ago
original author: SEO Melbourne
date: 2010-03-31T03:00:26-04:00

this may be a stupid question, but how do i run the command?

Is it done through the browser?

phinjensen commented 7 years ago
original author: Robert
date: 2010-07-26T20:44:41-04:00

Worked great just by typing the command where you would otherwise enter the URL address in your browser window.

phinjensen commented 7 years ago
original author: Reiki Vancouver
date: 2011-05-28T16:01:42-04:00

Hi Steph,

I ran the sed command in terminal but don't know how to view the output of the url's you showed.

Would you have any advice for a novice???

Thanks, Daniel

phinjensen commented 7 years ago
original author: Steph Skardal
date: 2011-05-31T09:24:24-04:00

Reiki,

The results of the sed command should output directly into the terminal. If you want, you can output them into a file by appending "> filename" to the end of the command, and using a text editor (vi, emacs, notepad, gedit, etc.) to read the file.

~Steph