firecat53 / urlscan

Mutt and terminal url selector (similar to urlview)
GNU General Public License v2.0
214 stars 38 forks source link

urlscan hangs when handling certain big files #104

Closed cjbd closed 3 years ago

cjbd commented 3 years ago

hello, i'm using urlscan to scan all urls in chromium source code, one of the text file hangs urlscan

https://source.chromium.org/chromium/chromium/src/+/master:third_party/blink/web_tests/http/tests/xmlhttprequest/resources/big.xml;l=1?q=big.xml&sq=&ss=chromium%2Fchromium%2Fsrc

file: big.xml is about 10MB, with very long element value, this file hangs urlscan for over 10 hours, i have to terminate the process

firecat53 commented 3 years ago

Do you have an example of a big source file that actually has URLs in it? Big.xml is mostly just 2s, all on one line with no URLs. I think you really want to use grep or ripgrep for a task like that. Urlscan isn't designed for scanning large generic files that like.

Maybe if you use grep to generate a list of files that has http(s)? using grep, then you can use urlscan to pull out the urls from those files.

cjbd commented 3 years ago

@firecat53 , i've scanned entire chromium source, only this one got issue, other big files are working fine i guess i can ignore this file