Closed HebaruSan closed 2 years ago
Hi, @HebaruSan!
Thanks for coming here to tell me about this. I really appreciate it.
I've added the new agents. Please test it when you can.
Hi @Ezriilc,
Thanks for the quick response! The special _IamCKAN
URLs seem to be broken now for all useragents:
$ curl.exe --fail -O --user-agent 'Mozilla/4.0 (compatible; CKAN)' https://www.kerbaltek.com/_IamCKAN_Gimme_hyperedit_
curl: (22) The requested URL returned error: 404
$ curl.exe --fail -O --user-agent 'Mozilla/5.0 (compatible; Netkanbot/1.0; +https://github.com/KSP-CKAN/xKAN-meta_testing)' https://www.kerbaltek.com/_IamCKAN_Gimme_hyperedit_
curl: (22) The requested URL returned error: 403
$ curl.exe --fail -O --user-agent 'Mozilla/5.0 (compatible; Netkanbot/1.0; +https://github.com/KSP-CKAN/NetKAN-Infra)' https://www.kerbaltek.com/_IamCKAN_Gimme_hyperedit_
curl: (22) The requested URL returned error: 403
DRAT! I thought that may have been too easy. Working...
FYI, we are still chatting about the possibility of tweaking the strings I gave a little; @DasSkelett wants to add CKAN
somewhere, and I agree. I will post an update once we reach a final decision. Checking for a browser family of Netkanbot
would probably be the most flexible way to cover all the newer possibilities.
DRAT! I thought that may have been too easy. Working...
Sorry for that. I THINK I have it fixed now.
FYI, we are still chatting about the possibility of tweaking the strings I gave a little; @DasSkelett wants to add
CKAN
somewhere, and I agree. I will post an update once we reach a final decision. Checking for a browser family ofNetkanbot
would probably be the most flexible way to cover all the newer possibilities.
I've modified my code to allow easy changes and additions to both CKAN and NETKAN user agents. Feel free to update me whenever a change is made.
Thanks! The description now has the latest strings (added CKAN;
to the middle of the parenthesized part).
The _IamCKAN
links seem to be going through an HTTP to HTTPS redirection and then failing when I try it with netkan.exe
:
879 [1] INFO CKAN.Net (null) - http://www.kerbaltek.com/_IamCKAN_Gimme_graphotron_ redirected to https://kerbaltek.com/_IamCKAN_Gimme_graphotron_
1516 [1] INFO CKAN.Net (null) - https://kerbaltek.com/_IamCKAN_Gimme_graphotron_ redirected to https://www.kerbaltek.com/_IamCKAN_Gimme_graphotron_
2156 [1] FATAL CKAN.NetKAN.Program (null) - The remote server returned an error: (403) Forbidden.
Thanks! The description now has the latest strings (added
CKAN;
to the middle of the parenthesized part).The
_IamCKAN
links seem to be going through an HTTP to HTTPS redirection and then failing when I try it withnetkan.exe
:879 [1] INFO CKAN.Net (null) - http://www.kerbaltek.com/_IamCKAN_Gimme_graphotron_ redirected to https://kerbaltek.com/_IamCKAN_Gimme_graphotron_ 1516 [1] INFO CKAN.Net (null) - https://kerbaltek.com/_IamCKAN_Gimme_graphotron_ redirected to https://www.kerbaltek.com/_IamCKAN_Gimme_graphotron_ 2156 [1] FATAL CKAN.NetKAN.Program (null) - The remote server returned an error: (403) Forbidden.
Sorry, but isn't the 'http(s)' part at your end?
I'll have to take up this fix a bit later today.
As far as I can tell, no, the server is doing that:
$ curl.exe --user-agent 'Mozilla/5.0 (compatible; Netkanbot/1.0; CKAN; +https://github.com/KSP-CKAN/NetKAN-Infra)' http://www.kerbaltek.com/_IamCKAN_Gimme_graphotron_
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved <a href="https://kerbaltek.com/_IamCKAN_Gimme_graphotron_">here</a>.</p>
</body></html>
$ curl.exe --user-agent 'Mozilla/5.0 (compatible; Netkanbot/1.0; CKAN; +https://github.com/KSP-CKAN/NetKAN-Infra)' $ curl.exe --user-agent 'Mozilla/5.0 (compatible; Netkanbot/1.0; CKAN; +https://github.com/KSP-CKAN/NetKAN-Infra)' https://kerbaltek.com/_IamCKAN_Gimme_graphotron_
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>302 Found</title>
</head><body>
<h1>Found</h1>
<p>The document has moved <a href="https://www.kerbaltek.com/_IamCKAN_Gimme_graphotron_">here</a>.</p>
</body></html>
The currently live version of the bot (using the old useragent, working before I submitted this) is also failing with a 404.
As far as I can tell, no, the server is doing that:
$ curl.exe --user-agent 'Mozilla/5.0 (compatible; Netkanbot/1.0; CKAN; +https://github.com/KSP-CKAN/NetKAN-Infra)' http://www.kerbaltek.com/_IamCKAN_Gimme_graphotron_ <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>301 Moved Permanently</title> </head><body> <h1>Moved Permanently</h1> <p>The document has moved <a href="https://kerbaltek.com/_IamCKAN_Gimme_graphotron_">here</a>.</p> </body></html>
$ curl.exe --user-agent 'Mozilla/5.0 (compatible; Netkanbot/1.0; CKAN; +https://github.com/KSP-CKAN/NetKAN-Infra)' $ curl.exe --user-agent 'Mozilla/5.0 (compatible; Netkanbot/1.0; CKAN; +https://github.com/KSP-CKAN/NetKAN-Infra)' https://kerbaltek.com/_IamCKAN_Gimme_graphotron_ <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>302 Found</title> </head><body> <h1>Found</h1> <p>The document has moved <a href="https://www.kerbaltek.com/_IamCKAN_Gimme_graphotron_">here</a>.</p> </body></html>
I meant that NETKAN/CKAN should be calling https, and not http. I can't control how they call the URL.
However, you should not be getting the 403/404 errors, so I'll get on that later today. Sorry!
I meant that NETKAN/CKAN should be calling https, and not http. I can't control how they call the URL.
I just thought that that's probably in my .version file or some such. Later...
Oh, the mods' .netkan files start us out on http://
, but that's the same as it was yesterday. We can update that eventually, but for now I want to limit the number of variables we're changing until things are working again.
My code is checking the IP address of NETKAN. Is that a problem?
Also, I've been separating things based on the CKAN app vs the NETKAN bot. Is sensible?
I'm ready for you to do a test again, when you can. Thanks.
My code is checking the IP address of NETKAN. Is that a problem?
Oh, maybe that's why I still get a 403 response when testing the new strings from my own computer. Which IP addresses are you allowing? The parts of the bot will run from inside the current AWS containers and (less often) some GitHub Action containers, so I would want to make sure neither of those is blocked.
The old string seems to be working again, though, so that's good. :+1:
My code is checking the IP address of NETKAN. Is that a problem?
Oh, maybe that's why I still get a 403 response when testing the new strings from my own computer. Which IP addresses are you allowing? The parts of the bot will run from inside the current AWS containers and (less often) some GitHub Action containers, so I would want to make sure neither of those is blocked.
The old string seems to be working again, though, so that's good. 👍
IF you'd like to give me a list of confirmed IP addresses to allow, I can do that.
However, I don't understand why the new strings aren't being approved, but the old is. They're all looked at the same way.
Hmm, apparently that's not recommended:
Since there are so many IP address ranges for GitHub-hosted runners, we do not recommend that you use these as allow-lists for your internal resources.
So I guess the simple answer to "Is that a problem?" was "Yes."
However, I don't understand why the new strings aren't being approved, but the old is. They're all looked at the same way.
Hmm, I'm not going to be able to point out what's causing it without seeing your code and server setup, but the first useragent listed in the description currently works, and the others return 403 Forbidden.
So I guess the simple answer to "Is that a problem?" was "Yes."
Drat. That is a bit of a security problem, but not a big one. I've disabled that check.
Hmm, I'm not going to be able to point out what's causing it without seeing your code and server setup, but the first useragent listed in the description currently works, and the others return 403 Forbidden.
Yep, that one's on me. I'm sorta thinking out loud to let you know where I am. Working...
Yep, that one's on me. I'm sorta thinking out loud to let you know where I am. Working...
I THINK I have the 404 errors fixed. Please test when you can.
Hi, thanks for the response. I'm still seeing the same errors.
$ netkan.exe --net-useragent 'Mozilla/5.0 (compatible; Netkanbot/1.0; CKAN; +https://github.com/KSP-CKAN/xKAN-meta_testing)' NetKAN/HyperEdit.netkan --verbose
...
838 [1] INFO CKAN.NetKAN.Transformers.HttpTransformer (null) - Executing HTTP transformation with #/ckan/http/http://www.kerbaltek.com/_IamCKAN_Gimme_hyperedit_#cachebuster1
1027 [1] INFO CKAN.Net (null) - http://www.kerbaltek.com/_IamCKAN_Gimme_hyperedit_#cachebuster1 redirected to https://kerbaltek.com/_IamCKAN_Gimme_hyperedit_
1933 [1] INFO CKAN.Net (null) - https://kerbaltek.com/_IamCKAN_Gimme_hyperedit_ redirected to https://www.kerbaltek.com/_IamCKAN_Gimme_hyperedit_
2756 [1] FATAL CKAN.NetKAN.Program (null) - The remote server returned an error: (404) Not Found.
$ netkan.exe --net-useragent 'Mozilla/5.0 (compatible; Netkanbot/1.0; CKAN; +https://github.com/KSP-CKAN/xKAN-meta_testing)' NetKAN/Graphotron.netkan --verbose
...
742 [1] INFO CKAN.NetKAN.Transformers.HttpTransformer (null) - Executing HTTP transformation with #/ckan/http/http://www.kerbaltek.com/_IamCKAN_Gimme_graphotron_
861 [1] INFO CKAN.Net (null) - http://www.kerbaltek.com/_IamCKAN_Gimme_graphotron_ redirected to https://kerbaltek.com/_IamCKAN_Gimme_graphotron_
2008 [1] INFO CKAN.Net (null) - https://kerbaltek.com/_IamCKAN_Gimme_graphotron_ redirected to https://www.kerbaltek.com/_IamCKAN_Gimme_graphotron_
3519 [1] FATAL CKAN.NetKAN.Program (null) - The remote server returned an error: (404) Not Found.
The old useragent still works.
Hi, thanks for the response. I'm still seeing the same errors.
I'm very sorry for all the trouble, and I'm grateful for all your feedback and efforts to help me.
I've made some changes, and if you could please try again, it will help me to figure out what's wrong.
THANKS!
It's no problem at all, I am as familiar with the change-test-fix cycle as anyone. :grinning:
With the latest changes, it works for me with any useragent; after I tried the ones we plan to use, I tested with Something else
and that worked, too. This would be fine for us, but I'm guessing you'll want to limit it more than that.
Hold on that for a moment, I need to double check whether the latter tests used a cached copy of the download, forgot about that before...
OK, confirmed that the Something else
useragent is able to retrieve the file without caching.
Hold on that for a moment, I need to double check whether the latter tests used a cached copy of the download, forgot about that before...
I was wondering if caching might be the issue. Shall I put it back to see if that's it?
I don't know what you'd be putting back, but I've confirmed that client-side caching isn't the cause of what I'm seeing.
We've switched over to the new strings and everything seems to be working, so we can consider this resolved for now. Thanks for your help!
Hi @Ezriilc!
We'd like to update the
User-Agent
strings that some CKAN bots and utilities use (see KSP-CKAN/CKAN#3490, KSP-CKAN/xKAN-meta_testing#84, and KSP-SpaceDock/SpaceDock#436), and we are aware this would break HyperEdit and Graphotron:Could you please update your site to treat all three of these as CKAN? The old one is not being removed, just supplemented:
Mozilla/4.0 (compatible; CKAN)
Mozilla/5.0 (compatible; Netkanbot/1.0; CKAN; +https://github.com/KSP-CKAN/xKAN-meta_testing)
Mozilla/5.0 (compatible; Netkanbot/1.0; CKAN; +https://github.com/KSP-CKAN/NetKAN-Infra)
If you have any questions, I'll do my best to answer them. Thanks!