Skallwar / suckit

Suck the InTernet
Apache License 2.0
741 stars 39 forks source link

What type of regex works? #91

Closed nmjohnson closed 4 years ago

nmjohnson commented 4 years ago

What are the wildcards for include/exclude regex? I tried %, *, and . and I don't think any of them worked. The script eventually runs into some errors and saves nothing. I want to find a specific .aspx filename and only save those because this website probably has at least 10 billion unique pages.

CohenArthur commented 4 years ago

@nmjohnson the Regexes are the one from the standard Rust Regex crate. The syntax is the same as Perl regexes.

So to download all aspx pages, you should use the following regex .*.aspx