eats way too much memory

GoogleCodeExporter commented 8 years ago

After 1h it eat up over 2GB of RAM. It's too much, especially since I used
minimal.wl. To reproduce it scanning page I administrate is enough. I can
provide its URL, but privately.

skipfish version 1.06b by <lcamtuf@google.com>

Scan statistics
---------------

       Scan time : 1:04:41.0108
   HTTP requests : 6467537 sent (1666.70/s), 4322432.00 kB in, 2537108.25
kB out (1767.42 kB/s)  
     Compression : 2239519.75 kB in, 6157672.50 kB out (46.66% gain)    
 HTTP exceptions : 2 net errors, 0 proto errors, 1 retried, 0 drops
 TCP connections : 64060 total (101.09 req/conn)  
  TCP exceptions : 0 failures, 2 timeouts, 1 purged
  External links : 1643732 skipped
    Reqs pending : 8367         

Database statistics
-------------------

          Pivots : 5486 total, 5279 done (96.23%)    
     In progress : 123 pending, 71 init, 3 attacks, 10 dict    
   Missing nodes : 100 spotted
      Node types : 1 serv, 88 dir, 4899 file, 123 pinfo, 241 unkn, 104 par,
30 val
    Issues found : 120 info, 6 warn, 3840 low, 8149 medium, 0 high impact
       Dict size : 2894 words (1010 new), 46 extensions, 256 candidates

Original issue reported on code.google.com by fen...@gmail.com on 21 Mar 2010 at 5:34

GoogleCodeExporter commented 8 years ago

For performance reasons, crawl data is stored in memory; this is intentional.

Do you indeed have 5500 files / directories on that server?

Original comment by lcam...@gmail.com on 21 Mar 2010 at 5:36

GoogleCodeExporter commented 8 years ago

After 2 hours I had to stop it (ctrl+c) cause it eat up almost 5GB.

Speaking about number of files:

(root@mainsrv)/home/www#find -type f | wc -l
243956
(root@mainsrv)/home/www#find -type d | wc -l
3070
(root@mainsrv)/home/www#

Original comment by fen...@gmail.com on 21 Mar 2010 at 6:41

GoogleCodeExporter commented 8 years ago

That sounds like a lot, and it's not practical to scan it fully given the 
design of of 
skipfish (at 50,000 requests per directory with minimal.wl). Only a scanner 
that does 
not perform comprehensive brute-forcing will be able to cover this much in a 
reasonable timeframe.

Consider limiting the scan to interesting areas on the server (using the -I 
flag), 
excluding tarpit locations using -X, or disabling name.ext bruteforcing with 
-Y. You 
can also use -s to limit the size of file samples collected.

Original comment by lcam...@gmail.com on 21 Mar 2010 at 8:28

Changed state: WontFix

cjemorton / skipfish

eats way too much memory #14