MontFerret / worker

Containerized Ferret worker
Apache License 2.0
14 stars 7 forks source link

add ip #11

Closed PierreBrisorgueil closed 4 years ago

PierreBrisorgueil commented 4 years ago

https://github.com/MontFerret/worker/issues/8

I had to do it for my need if you are ever interested. I also have a version to embed the IP in the response of the request post, but I am new to go I don't think the code is clean enough.

However, it could be really interesting to add to the POST request a debug mode to also return the IP and Version response.

ziflex commented 4 years ago

Hey, thanks for the PR. Could you tell what is the use case of it?

P.s. could you run go fmt to format the code?

PierreBrisorgueil commented 4 years ago

hello @ziflex

this makes it possible to have a follow-up of the IP used by the machines launching the scraps. If you run too many at once for example, you can be likened to DDOS one or more IP will be banished, it is interesting to have this data for debugging and understand some failed. Another case is to trace the location of the machine making the scrap. Some sites refuse certain origins for example.

in this sense, it would be interesting to be able to add this kind of option to the POST request. For example, in my current fork, I added a debug params to requests { text .... debug ....} (hardcoded). If this if is true my scraping returns the data, but also the IP and the ferret version. For daily scraping, this facilitates monitoring / debugging.

I agree that it is necessary for a Ferret in production, a lot of daily scraps to monitor, etc... I propose these supplements, be free to refuse them to leave the API lite. But the user is free to call these complements or not

ziflex commented 4 years ago

Let's name this endpoint with more generic path like /info, so that we could return some extra information later on. The response would be

{
    "ip": "140.82.112.4"
}
ziflex commented 4 years ago

Closed in favor of #13