code4craft / webmagic

A scalable web crawler framework for Java.
http://webmagic.io/
Apache License 2.0
11.37k stars 4.18k forks source link

doCycleRetry not working when using immutable maps in extras #1148

Closed LeonardMeyer closed 6 months ago

LeonardMeyer commented 6 months ago

Affected Version and code The latest version 0.10.0 and below.

https://github.com/code4craft/webmagic/blob/7ededbea1a3b040c4429293e10a30996ccf9caf0/webmagic-core/src/main/java/us/codecraft/webmagic/Spider.java#L482

Description I just encountered a bit of a tricky bug. Consider this :

Request request = new Request(url)
request.setExtras(Map.of("key", "value"))

When you get a cycle retry on this request, it'll blow up with UnsupportedException. That is because putExtra on the linked line above will call put on your immutable hashmap. It would be safer if setExtras would only allow a mutable hashmap as parameter somehow, or rewrap the parameter in new HashMap<>(immHashmap) or something.