KSP-SpaceDock / SpaceDock

Current Codebase (Python /Flask)
https://spacedock.info
Other
73 stars 33 forks source link

Feature Suggestion: Implement proper cache headers & Edge Cache #256

Closed cryptiklemur closed 4 years ago

cryptiklemur commented 4 years ago

Hey there! Been using spacedock through ckan, and have noticed that spacedock downloads are significantly slower than other sources.

A quick win to address this issue is to implement proper cache headers on downloads, and sticking the site behind Cloudflare. This can be done for free, and can be done for a little bit more money for even better performance.

The specific cache header i would suggest is:

cache-control: public,max-age=31536000,immutable

This is assuming that all downloads are tagged with a version, which i believe is the case. Tagged versions should never change, so the cache for them should be immutable. public allows the cache to be shared at the edge (Cloudflare's edge network is MASSIVE, and you don't pay for the bandwidth), allowing for lightening fast downloads for everyone.

If that assumption is incorrect, non-tagged versions could leave out immutable and have a shorter max-age. You can even purge the cache for that specific download, when it gets updated

I'd love to help out with this, but unfortunately my flask (and python) knowledge is nil. If i can offer guidance on this though, I would love to help out in that regard!

Love your work, keep it up!

V1TA5 commented 4 years ago

Hi, i thought i had those headers added. I will check that as soon as possible.

The sit already utilizes a caching proxy (check the header closely) and has enough bandwidth to fill your Internet connection. The Problems with downloading might stem from the small size of each file. TCP (as you might know) increases the packet size over a duration of a connection till it cant be transmitted reliably anymore without error. This ramping up process takes some time.

Using cloudflare isnt an option. I highly value the privacy of my visitors data and wouldnt want to give them away (especialy not without them knowing).

Python is something you could learn. What form of guidance are you offering?

cryptiklemur commented 4 years ago

Its just downloading single zip files, no? I shouldn't have a hard time ramping that up. With ckan, i consistently have failures with spacedock that have to fallback to archive.org as well.

I can respect the emphasis on valuing privacy off your visitors, but Cloudflare is a privacy-centric DNS/registrar. They likely care more about privacy than your current DNS provider & registrar, unless you host your own. Their edge network is also likely significantly larger than what you currently have.

Learning python (and flask) wouldn't be a small task. I unfortunately do not have the time or bandwidth to learn it.

I was offering guidance in the form of whiteboarding and soundboarding.

Xinayder commented 4 years ago

Cloudflare is a privacy-centric DNS/registrar

Uhm it's hard to believe that. Aside from the MITM that Cloudflare does, offering their services for free comes at a price, as VITAS mentioned. I don't know the actual metrics for SpaceDock, but just thinking users might need to input a captcha, breaking any other services that rely upon our API (like CKAN), is bad.

Right now we face a problem that rarely happens that our site gets slow. Their restriction on high traffic would break a lot of services because you'd need to input a captcha to access the site.

cryptiklemur commented 4 years ago

Feel free to look through their privacy policy, and all of their products. They make their money off of pro and enterprise users, not from selling data.

Captchas would only show up if the setting for that stuff (the security settings in Cloudflare) is turned on. They have no restrictions on high traffic.

V1TA5 commented 4 years ago

For transparency reasons: Are you a cloudflare salesperson or do you have any relationship with them?

You don't have the Bandwidth? Please explain.

cryptiklemur commented 4 years ago

No, i'm not. I'm just a pragmatic developer.

I definitely have the bandwidth. Downloads from github are fast, archive.org are alright, but not as fast.

Edit: I realise you were replying to my comment about bandwidth to learn python. It was an idiom. I don't have the time to, as its a large undertaking.

V1TA5 commented 4 years ago

You should consider asking them. I thank you for your suggestions but i ill pass on your offer of guidance. Our ideas of the worth of ours and my websites users privacy and trust are different. My approach has served me well in the over 35 years of developing software. I'm sure you have the same opinion about your way of doing things. So its not about the others doing it wrong but different.

The amount of outages are something i don't like and would change if i had more time and helping hands.

If you're still willing to help: learn Python or even better make a new software that can replace SpaceDocks aging code (I've seen you do PHP and js). I would love a single page laravel+vue.js implementation.

P.S. I relayed the question of SpaceDock being slower than other sources to the CKAN people. If you want to help debugging it you can join us either on matrix: #spacedock:52k.de , IRC: espernet #spacedock or discord (cant remember the server/room right now) I'm willing to listen to suggestions and help if you're willing to accept my ways of doing them.

cryptiklemur commented 4 years ago

I'm not a sales person, why would I ask them.

You have no idea what my idea of privacy and trust are, but I thank you for your candor. I would still be willing to help, but it's clear that my opinion would not be valued, so I will take my leave.

DasSkelett commented 4 years ago

have noticed that spacedock downloads are significantly slower than other sources.

In my personal experience I can't confirm this. The slowest is (understandably) always archive.org, SpaceDock and GitHub are both pretty much always fast enough to max out my badnwidth (50 Mbit/s).

I did some tests to verify this, and indeed the limiting factor is my internet connection. I also tested using wget and curl from a VPS at a provider with a direct peering to Hetzner, where the SpaceDock servers are located.

Uncached files (or with the --no-cache option) are downloading between 10-20 MB/s = 80-160 MBit/s. Cached files are downloading at 60-70 MB/s = 480-560 MBit/s.

There were tests done at a low-traffic time (12:00 CEST, 10:00 UTC). I may repeat it this evening during prime time today to see if there's a difference.

But I can't say these speeds are too low. We had some problems with the server recently, you may have been hit by it. But if you say the speeds are always low, I suspect your ISP having a bad connection to Hetzner.

Can you give us some details?