gleanerio / gleaner

Gleaner: JSON-LD and structured data on the web harvesting
https://gleaner.io
Apache License 2.0
17 stars 11 forks source link

Headless Chrome set User-Agent and HTTP Head calls to same URL #205

Open valentinedwv opened 1 year ago

valentinedwv commented 1 year ago

Calls using headless chrome are not showing up as EC harvesters.

Hey Kenton, not sure if you know why, but an IP address at the SD Supercomputing center seems to be bombing one of our Neotoma IP addresses with HEAD calls. It's not crazy, but it's about 2300 calls since yesterday morning, and a total of 16783 since the middle of April. Does this have something to do with one of the EarthCube projects?

It seems to be all to the same URL, so I'm wondering if something is mis-configured.

It's also a bit weird since the user service is described as Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.5249.119 Safari/537.36

So can we set the User Agent in Headless Chrome?

and where are all the head calls happening?

valentinedwv commented 1 year ago

The head call is the Source issue. The javascript calls a serivce with a head call.

valentinedwv commented 1 year ago

https://pkg.go.dev/github.com/mafredri/cdp#Emulation.SetUserAgentOverride