Open brbog opened 2 years ago
Just raising this as a possible improvement for anyone who wants to contribute something :-). Creating a good test for this (using WireMock?) is rather important, but requires some effort I currently can't commit to :-(.
During tests I observed a couple of times that a fetch failed due to 0 bytes being returned from the server. Since it was not deterministic, a simple "retry" could probably work, but there is currently no way to get that behavior.
The "magic" happens inside the private WebCrawler.processPage()-method. When requesting a retry after
fetchResult = pageFetcher.fetchPage(curURL);
was performed, the rest of the logic should also still be executed.