Ezriilc / HyperEdit

A plugin for Kerbal Space Program.
http://www.Kerbaltek.com/hyperedit
GNU General Public License v3.0
41 stars 30 forks source link

Requesting support for new CKAN User-Agent strings on kerbaltek.com #75

Closed HebaruSan closed 2 years ago

HebaruSan commented 2 years ago

Hi @Ezriilc!

We'd like to update the User-Agent strings that some CKAN bots and utilities use (see KSP-CKAN/CKAN#3490, KSP-CKAN/xKAN-meta_testing#84, and KSP-SpaceDock/SpaceDock#436), and we are aware this would break HyperEdit and Graphotron:

image

Could you please update your site to treat all three of these as CKAN? The old one is not being removed, just supplemented:

If you have any questions, I'll do my best to answer them. Thanks!

Ezriilc commented 2 years ago

Hi, @HebaruSan!

Thanks for coming here to tell me about this. I really appreciate it.

I've added the new agents. Please test it when you can.

HebaruSan commented 2 years ago

Hi @Ezriilc,

Thanks for the quick response! The special _IamCKAN URLs seem to be broken now for all useragents:

$ curl.exe --fail -O --user-agent 'Mozilla/4.0 (compatible; CKAN)' https://www.kerbaltek.com/_IamCKAN_Gimme_hyperedit_
curl: (22) The requested URL returned error: 404

$ curl.exe --fail -O --user-agent 'Mozilla/5.0 (compatible; Netkanbot/1.0; +https://github.com/KSP-CKAN/xKAN-meta_testing)' https://www.kerbaltek.com/_IamCKAN_Gimme_hyperedit_
curl: (22) The requested URL returned error: 403

$ curl.exe --fail -O --user-agent 'Mozilla/5.0 (compatible; Netkanbot/1.0; +https://github.com/KSP-CKAN/NetKAN-Infra)' https://www.kerbaltek.com/_IamCKAN_Gimme_hyperedit_
curl: (22) The requested URL returned error: 403
Ezriilc commented 2 years ago

DRAT! I thought that may have been too easy. Working...

HebaruSan commented 2 years ago

FYI, we are still chatting about the possibility of tweaking the strings I gave a little; @DasSkelett wants to add CKAN somewhere, and I agree. I will post an update once we reach a final decision. Checking for a browser family of Netkanbot would probably be the most flexible way to cover all the newer possibilities.

Ezriilc commented 2 years ago

DRAT! I thought that may have been too easy. Working...

Sorry for that. I THINK I have it fixed now.

FYI, we are still chatting about the possibility of tweaking the strings I gave a little; @DasSkelett wants to add CKAN somewhere, and I agree. I will post an update once we reach a final decision. Checking for a browser family of Netkanbot would probably be the most flexible way to cover all the newer possibilities.

I've modified my code to allow easy changes and additions to both CKAN and NETKAN user agents. Feel free to update me whenever a change is made.

HebaruSan commented 2 years ago

Thanks! The description now has the latest strings (added CKAN; to the middle of the parenthesized part).

The _IamCKAN links seem to be going through an HTTP to HTTPS redirection and then failing when I try it with netkan.exe:

879 [1] INFO CKAN.Net (null) - http://www.kerbaltek.com/_IamCKAN_Gimme_graphotron_ redirected to https://kerbaltek.com/_IamCKAN_Gimme_graphotron_
1516 [1] INFO CKAN.Net (null) - https://kerbaltek.com/_IamCKAN_Gimme_graphotron_ redirected to https://www.kerbaltek.com/_IamCKAN_Gimme_graphotron_
2156 [1] FATAL CKAN.NetKAN.Program (null) - The remote server returned an error: (403) Forbidden.
Ezriilc commented 2 years ago

Thanks! The description now has the latest strings (added CKAN; to the middle of the parenthesized part).

The _IamCKAN links seem to be going through an HTTP to HTTPS redirection and then failing when I try it with netkan.exe:

879 [1] INFO CKAN.Net (null) - http://www.kerbaltek.com/_IamCKAN_Gimme_graphotron_ redirected to https://kerbaltek.com/_IamCKAN_Gimme_graphotron_
1516 [1] INFO CKAN.Net (null) - https://kerbaltek.com/_IamCKAN_Gimme_graphotron_ redirected to https://www.kerbaltek.com/_IamCKAN_Gimme_graphotron_
2156 [1] FATAL CKAN.NetKAN.Program (null) - The remote server returned an error: (403) Forbidden.

Sorry, but isn't the 'http(s)' part at your end?

I'll have to take up this fix a bit later today.

HebaruSan commented 2 years ago

As far as I can tell, no, the server is doing that:

$ curl.exe --user-agent 'Mozilla/5.0 (compatible; Netkanbot/1.0; CKAN; +https://github.com/KSP-CKAN/NetKAN-Infra)' http://www.kerbaltek.com/_IamCKAN_Gimme_graphotron_

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved <a href="https://kerbaltek.com/_IamCKAN_Gimme_graphotron_">here</a>.</p>
</body></html>
$ curl.exe --user-agent 'Mozilla/5.0 (compatible; Netkanbot/1.0; CKAN; +https://github.com/KSP-CKAN/NetKAN-Infra)' $ curl.exe --user-agent 'Mozilla/5.0 (compatible; Netkanbot/1.0; CKAN; +https://github.com/KSP-CKAN/NetKAN-Infra)' https://kerbaltek.com/_IamCKAN_Gimme_graphotron_

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>302 Found</title>
</head><body>
<h1>Found</h1>
<p>The document has moved <a href="https://www.kerbaltek.com/_IamCKAN_Gimme_graphotron_">here</a>.</p>
</body></html>
HebaruSan commented 2 years ago

The currently live version of the bot (using the old useragent, working before I submitted this) is also failing with a 404.

Ezriilc commented 2 years ago

As far as I can tell, no, the server is doing that:

$ curl.exe --user-agent 'Mozilla/5.0 (compatible; Netkanbot/1.0; CKAN; +https://github.com/KSP-CKAN/NetKAN-Infra)' http://www.kerbaltek.com/_IamCKAN_Gimme_graphotron_

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved <a href="https://kerbaltek.com/_IamCKAN_Gimme_graphotron_">here</a>.</p>
</body></html>
$ curl.exe --user-agent 'Mozilla/5.0 (compatible; Netkanbot/1.0; CKAN; +https://github.com/KSP-CKAN/NetKAN-Infra)' $ curl.exe --user-agent 'Mozilla/5.0 (compatible; Netkanbot/1.0; CKAN; +https://github.com/KSP-CKAN/NetKAN-Infra)' https://kerbaltek.com/_IamCKAN_Gimme_graphotron_

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>302 Found</title>
</head><body>
<h1>Found</h1>
<p>The document has moved <a href="https://www.kerbaltek.com/_IamCKAN_Gimme_graphotron_">here</a>.</p>
</body></html>

I meant that NETKAN/CKAN should be calling https, and not http. I can't control how they call the URL.

However, you should not be getting the 403/404 errors, so I'll get on that later today. Sorry!

Ezriilc commented 2 years ago

I meant that NETKAN/CKAN should be calling https, and not http. I can't control how they call the URL.

I just thought that that's probably in my .version file or some such. Later...

HebaruSan commented 2 years ago

Oh, the mods' .netkan files start us out on http://, but that's the same as it was yesterday. We can update that eventually, but for now I want to limit the number of variables we're changing until things are working again.

Ezriilc commented 2 years ago

My code is checking the IP address of NETKAN. Is that a problem?

Ezriilc commented 2 years ago

Also, I've been separating things based on the CKAN app vs the NETKAN bot. Is sensible?

I'm ready for you to do a test again, when you can. Thanks.

HebaruSan commented 2 years ago

My code is checking the IP address of NETKAN. Is that a problem?

Oh, maybe that's why I still get a 403 response when testing the new strings from my own computer. Which IP addresses are you allowing? The parts of the bot will run from inside the current AWS containers and (less often) some GitHub Action containers, so I would want to make sure neither of those is blocked.

The old string seems to be working again, though, so that's good. :+1:

Ezriilc commented 2 years ago

My code is checking the IP address of NETKAN. Is that a problem?

Oh, maybe that's why I still get a 403 response when testing the new strings from my own computer. Which IP addresses are you allowing? The parts of the bot will run from inside the current AWS containers and (less often) some GitHub Action containers, so I would want to make sure neither of those is blocked.

The old string seems to be working again, though, so that's good. 👍

IF you'd like to give me a list of confirmed IP addresses to allow, I can do that.

However, I don't understand why the new strings aren't being approved, but the old is. They're all looked at the same way.

HebaruSan commented 2 years ago

Hmm, apparently that's not recommended:

https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners#ip-addresses

Since there are so many IP address ranges for GitHub-hosted runners, we do not recommend that you use these as allow-lists for your internal resources.

So I guess the simple answer to "Is that a problem?" was "Yes."

However, I don't understand why the new strings aren't being approved, but the old is. They're all looked at the same way.

Hmm, I'm not going to be able to point out what's causing it without seeing your code and server setup, but the first useragent listed in the description currently works, and the others return 403 Forbidden.

Ezriilc commented 2 years ago

So I guess the simple answer to "Is that a problem?" was "Yes."

Drat. That is a bit of a security problem, but not a big one. I've disabled that check.

Hmm, I'm not going to be able to point out what's causing it without seeing your code and server setup, but the first useragent listed in the description currently works, and the others return 403 Forbidden.

Yep, that one's on me. I'm sorta thinking out loud to let you know where I am. Working...

Ezriilc commented 2 years ago

Yep, that one's on me. I'm sorta thinking out loud to let you know where I am. Working...

I THINK I have the 404 errors fixed. Please test when you can.

HebaruSan commented 2 years ago

Hi, thanks for the response. I'm still seeing the same errors.

$ netkan.exe --net-useragent 'Mozilla/5.0 (compatible; Netkanbot/1.0; CKAN; +https://github.com/KSP-CKAN/xKAN-meta_testing)' NetKAN/HyperEdit.netkan --verbose
...
838 [1] INFO CKAN.NetKAN.Transformers.HttpTransformer (null) - Executing HTTP transformation with #/ckan/http/http://www.kerbaltek.com/_IamCKAN_Gimme_hyperedit_#cachebuster1
1027 [1] INFO CKAN.Net (null) - http://www.kerbaltek.com/_IamCKAN_Gimme_hyperedit_#cachebuster1 redirected to https://kerbaltek.com/_IamCKAN_Gimme_hyperedit_
1933 [1] INFO CKAN.Net (null) - https://kerbaltek.com/_IamCKAN_Gimme_hyperedit_ redirected to https://www.kerbaltek.com/_IamCKAN_Gimme_hyperedit_
2756 [1] FATAL CKAN.NetKAN.Program (null) - The remote server returned an error: (404) Not Found.

$ netkan.exe --net-useragent 'Mozilla/5.0 (compatible; Netkanbot/1.0; CKAN; +https://github.com/KSP-CKAN/xKAN-meta_testing)' NetKAN/Graphotron.netkan --verbose
...
742 [1] INFO CKAN.NetKAN.Transformers.HttpTransformer (null) - Executing HTTP transformation with #/ckan/http/http://www.kerbaltek.com/_IamCKAN_Gimme_graphotron_
861 [1] INFO CKAN.Net (null) - http://www.kerbaltek.com/_IamCKAN_Gimme_graphotron_ redirected to https://kerbaltek.com/_IamCKAN_Gimme_graphotron_
2008 [1] INFO CKAN.Net (null) - https://kerbaltek.com/_IamCKAN_Gimme_graphotron_ redirected to https://www.kerbaltek.com/_IamCKAN_Gimme_graphotron_
3519 [1] FATAL CKAN.NetKAN.Program (null) - The remote server returned an error: (404) Not Found.

The old useragent still works.

Ezriilc commented 2 years ago

Hi, thanks for the response. I'm still seeing the same errors.

I'm very sorry for all the trouble, and I'm grateful for all your feedback and efforts to help me.

I've made some changes, and if you could please try again, it will help me to figure out what's wrong.

THANKS!

HebaruSan commented 2 years ago

It's no problem at all, I am as familiar with the change-test-fix cycle as anyone. :grinning:

With the latest changes, it works for me with any useragent; after I tried the ones we plan to use, I tested with Something else and that worked, too. This would be fine for us, but I'm guessing you'll want to limit it more than that.

HebaruSan commented 2 years ago

Hold on that for a moment, I need to double check whether the latter tests used a cached copy of the download, forgot about that before...

OK, confirmed that the Something else useragent is able to retrieve the file without caching.

Ezriilc commented 2 years ago

Hold on that for a moment, I need to double check whether the latter tests used a cached copy of the download, forgot about that before...

I was wondering if caching might be the issue. Shall I put it back to see if that's it?

HebaruSan commented 2 years ago

I don't know what you'd be putting back, but I've confirmed that client-side caching isn't the cause of what I'm seeing.

HebaruSan commented 2 years ago

We've switched over to the new strings and everything seems to be working, so we can consider this resolved for now. Thanks for your help!