adamdehaven / fetchurls

A bash script to spider a site, follow links, and fetch urls (with built-in filtering) into a generated text file.
https://www.adamdehaven.com/blog/easily-crawl-a-website-and-fetch-all-urls-with-a-shell-script/
MIT License
127 stars 45 forks source link

[Feature Request] Add Option for User-Agent #14

Open sommer-gei opened 2 years ago

sommer-gei commented 2 years ago

Hey Adam, thank you for your nice little script! :-)

I ran into the problem that the website I'd like to fetch all URLs from blocked the default wget User-Agent (currently "Wget/1.20.3 (linux-gnu)"). To progress with my task I manually changed your (current) script and added a User-Agent string (to the wget command) and it worked very well.

Question: Are you willing to add an option for the User-Agent?

If yes, I would prepare a PR … :-)

KR

adamdehaven commented 2 years ago

Sure, feel free to submit a PR

sommer-gei commented 2 years ago

FYI: In the meantime I (temp.) changed the User-Agent with a local ~/.wgetrc file with this content

user-agent = "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0"
adamdehaven commented 2 years ago

@sommer-gei did you ever prepare the Pull Request? I'm happy to add as an option if you want to submit the changes; just please limit the changes to only adding the User Agent option

sommer-gei commented 2 years ago

@adamdehaven Nope, ATM no PR. :-/

(Changing the wget UA manually in the script code is no problem. Implementing/adding the UA as a script CLI parameter is more complex as I thought … #sry)