crawler-commons / url-frontier

API definition, resources and reference implementation of URL Frontiers
Apache License 2.0
44 stars 11 forks source link

getURLs will lock whole Queue when not assign Key #87

Open saselovejulie opened 1 year ago

saselovejulie commented 1 year ago

for example sent 5 urls to QueueA then i use this code to getUrls, run this code 5 times you will got 5 different urls in QueueA the sixth time will got empty result.

    Urlfrontier.GetParams request =
            Urlfrontier.GetParams.newBuilder()
                    .setMaxUrlsPerQueue(1)
                    .setMaxQueues(0)
                    .setKey("QueueA")
                    .setDelayRequestable(600)
                    .setCrawlID(crawlId)
                    .build();

but if you not particularly the key like this code

    Urlfrontier.GetParams request =
            Urlfrontier.GetParams.newBuilder()
                    .setMaxUrlsPerQueue(1)
                    .setMaxQueues(0)
                    .setDelayRequestable(600)
                    .setCrawlID(crawlId)
                    .build();

you can only run once, because of the secondary time will return empty.

i checked class AbstractFrontierService.java looks like

        if (currentQueue.getInProcess(now) >= maxURLsPerQueue) {
            continue;
        }

this code will refuse the request is this correct?

jnioche commented 1 year ago

Sorry for the late reply, just back from holidays. Thanks for reporting this. It looks like a bug. Could you please try running it with 'read.thread.num' set to 1? Any chance you could contribute a unit test to reproduce the issue? Thanks