KSP-CKAN / CKAN

The Comprehensive Kerbal Archive Network
https://forum.kerbalspaceprogram.com/index.php?/topic/197082-*
Other
1.98k stars 347 forks source link

Proposal: Introduce CKAN-meta-Legacy #1975

Open Dazpoet opened 7 years ago

Dazpoet commented 7 years ago

I saw this brought up by @Ruedii on irc and thought I should chime in.

Problem

CKAN-meta master is getting to be a pretty big download (8+MB) and when unzipped it takes up 40+MB on disk (I'm on Win10 running NTFS with whatever the standard settings are) even though the real size is just some 13MB.

This potentially makes CKAN run slower as the size will only continue to increase over time.

Proposed solution

Trim the size of CKAN-meta master by moving metadata for older versions of KSP to a legacy metadataarchive (CKAN-meta-Legacy) based on some breaking point ksp_version = X.

How do we choose the breaking point?

I think at this point the breaking point should be ksp 1.1.3 so that anything with a ksp_version < 1.1 in the metadata should be moved to a legacy archive. I base this on the fact that RO is not yet out for 1.2.x and neither is FAR which probably means that a lot of players are still stuck running 1.1.3 while waiting for key mods. Using RO as a measuring stick is probably pretty smart since it has a rather comprehensive depends and recommends list and is normally pretty stringent when it comes to not releasing before it's ready. To me it seems it is often one of the last mods to come out.

But we are the COMPREHENSIVE Kerbal Archive Network, will we become just the KAN now?!

Obviously not, we are not removing metadata, just moving it. Introducing a Legacy option shouldn't be harder than moving all metadata with ksp_version < 1.1 to a new repository (or is branch better?) named e.g. CKAN-meta-Legacy and then adding it to the repositories list

This can probably be done easily by a smart person and regexp, however I am not that smart person or this would've been a PR rather than an issue.

What about the users running ksp version < future breaking point when the Legacy archive is huge?

They'll just have to download the whole shebang. This usergroup will probably be relatively small and well-prepared for the potential problems that their choice creates.

Problems with this proposal

Mods with "ksp_version" : "any" and a huge amount of releases

I have no idea how to resolve this, I don't even know if it's a real problem just yet but it might be in the future. For these keeping Y versions of backlog might be enough. Might require some manual checking every now and then but I hope it won't be a gigantic issue at this point.

Someone will have to do it...

...and I don't know how. Hoping someone finds this worthwhile though and knows a good way of doing it :)

Users still running e.g. 0.90 will suddenly need to an additional step to reach their mods

This is probably mostly a communication issue but needs to be handled seriously. I propose that a change like this one is occurs near or directly after a CKAN release so that we can atleast give users with autoupdate a heads up that it's coming. Hopefully the impact will be low though since I doubt we have a lot of users still running KSP < 1.1, or atleast I hope so!

politas commented 7 years ago

Perhaps we have another repository for "ksp_version" : "any" mods? I very much like this idea. I've been getting concerned about the size of the repo, and it';s obvious that we need to split it up at some stage.

Perhaps we could use the builds.json in each repository to define the versions of KSP it covers, and we can then add some code to the "Add an install" procedure to inform users if the KSP version in the install they have added is not covered by their current repository selections.

For mod versions that cross repository boundaries, I can't see a fundamental problem with havingg duplicated ckan files across repos. That shouldn't add hugely to the size.

Maybe we should maintain CKAN-meta as a complete repo, and have some process that duplicates ckans into relevant repos. That doesn't sound too complex.

techman83 commented 7 years ago

I wonder if there is a way to do it without splitting the repo?

politas commented 7 years ago

I suppose we could build a separate .tar.gz file for each KSP version including all the ckans that support that version without having to have separate repos?

techman83 commented 7 years ago

That'd be doable, we'd have to subscribe to the webhooks and process them ourselves though. It'd be doable. Question being which is the saner option?

One Repo:

Split Repo:

politas commented 7 years ago

One Repo votes:

I'll give a thumbs up to One Repo, with the additional note that I know a little Perl and would be happy to learn more in the process of adding this, if you have the time to do most of the Repo-side coding. As I see it, there's an additional pro that we should be able to implement it while the existing system is still working, so we've got clear stages.

politas commented 7 years ago

Split Repo votes:

And I'll give a thumbs down to Split Repo, because it seems like adding technical debt without ultimately solving the problem. On the other hand, Split Repo does mean we're putting a cap on the size of a single repo, though we're two orders of magnitude below hitting GitHub's Repo max size suggestion.

Dazpoet commented 7 years ago

As someone who has already been through the split repo thing once I didn't particularily like it since it made issuereports very VERY annoying to deal with. If it's possible to do in one repo I'd much prefer that tbh.

Ruedii commented 7 years ago

Since the legacy repo would largely be unused this wouldn't be an issue. This would not affect CKAN itself, just the metadata, and would only be used as a way not to purge the old data, so people could easily find it who are wanting to fetch mods for legacy versions of KSP that were shared through CKAN without having to manually fetch the old database.

They would both be hosted in the sam GIT repository and entries being archived would simply be moved by a script whenever the cutoff version is updated which wouldn't be very often.

On Jan 10, 2017 2:10 PM, "Willhelm Rendahl" notifications@github.com wrote:

As someone who has already been through the split repo thing once I didn't particularily like it since it made issuereports very VERY annoying to deal with. If it's possible to do in one repo I'd much prefer that tbh.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/KSP-CKAN/CKAN/issues/1975#issuecomment-271667957, or mute the thread https://github.com/notifications/unsubscribe-auth/ADBV6V0bQTbC8WMP2VXKWVEF97X6xA52ks5rQ9eRgaJpZM4LdqjJ .

politas commented 7 years ago

Hi @Ruedii, thanks for getting involved!

As I said, i fear that the split repo model with an arbitrary divide without some logic behind it adds technical debt to the project without providing an ultimate solution.

The Single repo model I am thinking of is that all the .ckan files are in a single repository, but we create a separate .tar.gz file of .ckan files for each KSP version (whether we break it up by major, minor, patch or build level is a question we have to decide), as well as the full .tar.gz of the entire repository. Then we change the default repository rules in the client to point to the relevant .tar.gz file for the KSP version. Older versions of CKAN would not be affected, they would still see the full repo. We can implement the multiple tar.gz files and people can utilise them manually until we implement an elegant solution in the client.

Ruedii commented 7 years ago

I think doing one for each version is a bit much. That is why I Favor a cutoff version.

However, they should use a single github repository. It would be stupid not to.

On Jan 21, 2017 4:06 AM, "Myk" notifications@github.com wrote:

Hi @Ruedii https://github.com/Ruedii, thanks for getting involved!

As I said, i fear that the split repo model with an arbitrary divide without some logic behind it adds technical debt to the project without providing an ultimate solution.

The Single repo model I am thinking of is that all the .ckan files are in a single repository, but we create a separate .tar.gz file of .ckan files for each KSP version (whether we break it up by major, minor, patch or build level is a question we have to decide), as well as the full .tar.gz of the entire repository. Then we change the default repository rules in the client to point to the relevant .tar.gz file for the KSP version. Older versions of CKAN would not be affected, they would still see the full repo. We can implement the multiple tar.gz files and people can utilise them manually until we implement an elegant solution in the client.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/KSP-CKAN/CKAN/issues/1975#issuecomment-274249774, or mute the thread https://github.com/notifications/unsubscribe-auth/ADBV6aB_vkNZEG_-AtsUB8C8oH8Na-k5ks5rUcqjgaJpZM4LdqjJ .

politas commented 7 years ago

If we split by Major version, we'll have one for all the 0.x.y and one for all the 1.x.y And pretty much the only crossover will be the "any"s

techman83 commented 7 years ago

I'd probably go with a config file, define our split points. Anything not defined or set as 'any' will end up in the "current". Future splits will be a matter of updating the config file in the repository and the code should take care of the rest.

I actually don't think it would be ludicrous amounts of code, a lot of the infrastructure is already in place to achieve it.

politas commented 7 years ago

Oh yeah, there's not a lot of work to do. The client already has multiple repository support. I'm not sure how it handles duplication between repos, but I suspect given issues we've had in the past that anything with the same name will get merged into a single line in the modlist. (It really ought to be by identifier, but I guess that's trickier). I think 1.0 is a good split point. @techman83, are you cool with sorting out the separate .tar.gzs? I like the idea of a config file to define the split. Could we use the existing builds.json file and add a new field per entry to say which .tar.gz that version should go into?

Ruedii commented 7 years ago

I think we may want to look at how the repos are combined as well. Do we have any sort of server side auto build script that can combine all the files into a single XML or json file?

this may be something to look at for down the road.

On Jan 22, 2017 3:52 AM, "Myk" notifications@github.com wrote:

Oh yeah, there's not a lot of work to do. The client already has multiple repository support. I'm not sure how it handles duplication between repos, but I suspect given issues we've had in the past that anything with the same name will get merged into a single line in the modlist. (It really ought to be by identifier, but I guess that's trickier). I think 1.0 is a good split point. @techman83 https://github.com/techman83, are you cool with sorting out the separate .tar.gzs? I like the idea of a config file to define the split. Could we use the existing builds.json file and add a new field per entry to say which .tar.gz that version should go into?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/KSP-CKAN/CKAN/issues/1975#issuecomment-274318009, or mute the thread https://github.com/notifications/unsubscribe-auth/ADBV6QLsizcpRoPk1m511tznnCKstynxks5rUxi5gaJpZM4LdqjJ .

politas commented 7 years ago

If we stick with a single Repo and separate tar.gz files, that's not an issue.

Ruedii commented 7 years ago

I agree, that's exactly what I was proposing. As far as NETKAN is concerned there is one repository to upload to, and as far as CKAN is concerned there are two repository downloads (i.e. to tar.gz indexes). Eventually there might be 3 if there is a KSP 2.x series released down the road.

techman83 commented 7 years ago

@politas I have some ideas on how to achieve it. When I get some spare cycles I'll have a crack at it.

@Ruedii I'm not sure what benefit combining them all into one file would give us. It certainly would add more complexity to the process.

techman83 commented 7 years ago

I've opened a PR in KSP-CKAN/NetKAN-bot#58, I need to write a full description of how it works. But the basic design is:

From my offline testing so far:

current!CKAN-meta> find . -name *.ckan|wc -l
4828
current!CKAN-meta> checkout middle
Switched to branch 'middle'
middle!CKAN-meta> find . -name *.ckan|wc -l
4566
middle!CKAN-meta> checkout legacy 
Switched to branch 'legacy'
legacy!CKAN-meta> find . -name *.ckan|wc -l
237
legacy!CKAN-meta> checkout master 
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.
master u=!CKAN-meta> find . -name *.ckan|wc -l
9631

4828+4566+237 = 9631

I'll fix up the broken travis tests (they all pass locally though) and do a more thorough write up in the PR. But feel free to have a look and make suggestions. For reference, currents lower boundary is set to '1.1.0'.

Example releases.json (order of the array is important as 'any' goes into the first entry):

{
    "releases": [
        {
            "lower": "1.1.0",
            "name": "current"
        },
        {
            "lower": "0.90.0",
            "name": "middle",
            "upper": "1.1.0"
        },
        {
            "name": "legacy",
            "upper": "0.90.0"
        }
    ]
}
politas commented 7 years ago

"Legacy" seems a little light. What numbers do you get if you set

"current" - 1.2.0 - "middle" - 1.0.0 - 1.2.0 "legacy" - - 1.0.0

techman83 commented 7 years ago

Oh that was just an example. It's about right, we didn't have many mods below 0.90.0 as 0.90.0 was where things started to really take off mod wise. We probably don't need to split 3 ways either, it was more of an example of what's possible.

politas commented 7 years ago

I think a lot of < 0.90 mods have just been lost when Kerbalstuff shut down, too. Most ARR mod makers didn't bother to load up their old releases, even if they did move to another hosting platform.

Gryffen1971 commented 6 years ago

Here is a suggestion; How about having CKAN remove all unnecessary entries from the registry.json file. What i mean by that is have CKAN check when updating the repositories it uses by only looking for entries for the lastest version(s) of Kerbal Space Program. As of right now my registry.jason file is running around (22,529kb). Removing those entries that are for earlier version of Kerbal Space Program will reduce the file size. I will attach my json file for reference if, needed.

politas commented 6 years ago

@Gryffen1971 , if we process the whole repo and then purge the non-relevant mod versions then:

Gryffen1971 commented 6 years ago

@politas, thanks for informing me about that. I had forgot that it would take longer for it to purge the information that is non-relevant and your right it would slow down the process. Skipped that one in my thought process. Thanks for reminding me about that.

Ruedii commented 6 years ago

For incompatible mod versions I would recommend adding access to the second repository.

This is also why I recommended only using a split.

The version I would use as the barrier for the split as either 0.8 or 0.9 This is because this was where several major changes were put in the KSP code. These are also the moving to "pre-release" state from "early access" state.

Ruedii commented 6 years ago

Oh, as a note, I recommend the following implementation:|

  1. Items are added into "CKAN Full" repository when added.
  2. A build script is run automatically to push them to the various versioned repositories.

I would actually put the following repositories: CKAN Current: Only the past few versions included. (Currently either 1.4+, 1.3+ or 1.2+) CKAN Recent: Fairly far back, a good two-point split excluding CKAN Current. (Currently 1.0+) CKAN Legacy: Anything not in CKAN Recent. Lots of old stuff.