Letractively / abot

Automatically exported from code.google.com/p/abot
Apache License 2.0
0 stars 0 forks source link

When set to not crawl external links, They are still added to scheduler to be crawled and later skipped #111

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
We can choose to not add pages that are external when config says to only crawl 
internal pages.  This will reduce memory and speed things up.

Patch attached.

Original issue reported on code.google.com by i...@resultly.com on 7 Jul 2013 at 3:51

Attachments:

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
If a new thread is created for these external links then make a change, if it 
is handled in the main crawler thread then do not change.

Original comment by sjdir...@gmail.com on 8 Jul 2013 at 3:16

GoogleCodeExporter commented 8 years ago
What do you mean by this?  Does my code change look sufficient for the issue i 
suggested?

Original comment by i...@resultly.com on 8 Jul 2013 at 3:25

GoogleCodeExporter commented 8 years ago
I haven't seen your code for this yet. I thought you were going to add it to 
the 1.2.2 branch so I can see it all at one. The previous comment was just a 
note to self if I have to implement it. 

Original comment by sjdir...@gmail.com on 8 Jul 2013 at 4:54

GoogleCodeExporter commented 8 years ago
I attched file with this simple change. On the board. But will commit to
122

Sent from my BlackBerry 10 smartphone.
  *From: *abot@googlecode.com
*Sent: *Monday, July 8, 2013 11:54 AM
*To: *ilya@resultly.com
*Reply To: *abot@googlecode.com
*Subject: *Re: Issue 111 in abot: When set to not crawl external links,
They are still added to scheduler to be crawled and later skipped

Original comment by i...@resultly.com on 8 Jul 2013 at 4:59

GoogleCodeExporter commented 8 years ago
This was fixed by Ilya, will be merged in from 1.2.2 branch

Original comment by sjdir...@gmail.com on 19 Jul 2013 at 7:25

GoogleCodeExporter commented 8 years ago

Original comment by sjdir...@gmail.com on 3 Sep 2013 at 1:50

GoogleCodeExporter commented 8 years ago
Fixed as part of 1.2.3 on github

Original comment by sjdir...@gmail.com on 3 Sep 2013 at 2:48

GoogleCodeExporter commented 8 years ago

Original comment by sjdir...@gmail.com on 3 Sep 2013 at 2:49