alexloizou / andiparos

Automatically exported from code.google.com/p/andiparos
0 stars 0 forks source link

Spider: do not download binary/large files #27

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. spider any site with binary files (swf, mov, zip, ...)
2. the andiparos spider mechanism will download the binary and search for urls

What is the expected output? What do you see instead?
imho binarys (or very large files) should not be downloaded completely. 
downloading such large files slows down the spidering mechanism substantially 
and on a page with many screen casts or videos this behavior is not helpful for 
spidering.

What version of the product are you using? On what operating system?
1.0.6 Mac OS X 10.6

Original issue reported on code.google.com by cocaman on 26 Oct 2010 at 2:38