Closed cjbd closed 3 years ago
Do you have an example of a big source file that actually has URLs in it? Big.xml is mostly just 2s, all on one line with no URLs. I think you really want to use grep or ripgrep for a task like that. Urlscan isn't designed for scanning large generic files that like.
Maybe if you use grep to generate a list of files that has http(s)? using grep, then you can use urlscan to pull out the urls from those files.
@firecat53 , i've scanned entire chromium source, only this one got issue, other big files are working fine i guess i can ignore this file
hello, i'm using urlscan to scan all urls in chromium source code, one of the text file hangs urlscan
https://source.chromium.org/chromium/chromium/src/+/master:third_party/blink/web_tests/http/tests/xmlhttprequest/resources/big.xml;l=1?q=big.xml&sq=&ss=chromium%2Fchromium%2Fsrc
file: big.xml is about 10MB, with very long element value, this file hangs urlscan for over 10 hours, i have to terminate the process