Open GoogleCodeExporter opened 9 years ago
Hi,
Firstly, thanks for the great project!
I've customising your code to do just that. I want to share my experience with
you and the community.
My approach to modify the SchedulePageLinks(CrawledPage crawledPage) method in
WebCrawler so that it fires off an async event when the link is added to the
scheduler. The async event will then keep track of scheduled links in the
database.
Also on my WebCrawler constructor, I initialise the scheduler so that it reads
from the db the links that were scheduled previously.
Also I added CrawlDecisionMaker to decide whether a link should be scheduled.
Happy to share my code with you guys if you like.
Cheers,
Andrew
Original comment by andrewyo...@gmail.com
on 17 Aug 2013 at 2:10
Hello
Thanks for your effort.
Can you add pause and resume option?
This abilities are very important.
Additionally, also the history check is a good idea (don't crawl pages again)
I'm waiting for your great work...
Thanks a lot.
Original comment by smsghase...@gmail.com
on 18 Aug 2013 at 2:35
Original comment by sjdir...@gmail.com
on 3 Sep 2013 at 2:54
This task is partially complete since issue 92 was completed. Issue 92 gives
the ability to store crawled urls to disk. Thats part 1. Part 2 is to store all
the context (counts, depths, etcs..) along with the url db.
Original comment by sjdir...@gmail.com
on 20 Jan 2014 at 4:02
Moving this to release 2.0 since I dont want to hold up this release any longer
Original comment by sjdir...@gmail.com
on 20 Jan 2014 at 4:03
Thanks for you effort.
I have some Ideas that are not hard to perform but are important.
Unfortunately, I'm not a good programmer that can write these codes
Original comment by smsghase...@gmail.com
on 20 Jan 2014 at 2:40
Original comment by sjdir...@gmail.com
on 20 Jan 2014 at 3:50
Original issue reported on code.google.com by
smsghase...@gmail.com
on 29 Jul 2013 at 5:53