code4craft / webmagic

A scalable web crawler framework for Java.
http://webmagic.io/
Apache License 2.0
11.37k stars 4.18k forks source link

关于Spider#setScheduler(Scheduler)存在bug #1172

Open KamiNoYuki opened 2 months ago

KamiNoYuki commented 2 months ago

方法代码:

public Spider setScheduler(Scheduler updateScheduler) {
        checkIfRunning();
        //此处存在问题,这样会导致直接将内部的scheduler设置为updateScheduler,然后又尝试自己poll自己进行数据迁徙
        SpiderScheduler oldScheduler = this.scheduler;
        //应该修改为下面这样
        //Scheduler oldScheduler = scheduler.getScheduler();
        scheduler.setScheduler(updateScheduler);
        Request request;
        while ((request = oldScheduler.poll(this)) != null) {
            System.out.println("move oldScheduler task to updateScheduler");
            this.scheduler.push(request, this);
        }
        return this;
    }