davidjoffe / dave_gnukem

Dave Gnukem is a cross-platform 2D scrolling platform shooter inspired by Duke Nukem 1
GNU General Public License v2.0
75 stars 22 forks source link

Relatively large binaries in repo history - also Makefile thoughts #153

Closed davidjoffe closed 10 months ago

davidjoffe commented 1 year ago

EDIT 22 NOV 2022 I HAVE DONE THIS - APOLOGIES FOR ANY INCONVENIENCE. You will probably have to re-clone maybe to re-set up your working trees. ALSO have renamed the main branch. This is a force-rewrite of the history. Your contributions should be intact in the history. But I won't do this again.

[low prio] FYI: if one clean clones the repo it's about 24MB. Relatively tiny by most standards BUT, most of that space is files that shouldn't be in the history, which bothers me slightly (e.g. "datasrc/.psd" that are now in separate repo, where they should be) and "data/.tga" and "data/*.lev" etc. that are also in seperate repo, as they should be).

No Intellectual Property issues, that's not the issue, just the space. (These are all files that are not meant to be present directly if you git clone the source in your working tree - it's just that in the beginning I just had one repo, I hadn't separated the data and datasrc out .. only later decided to split into 3 repos (src, datasrc for things like .psd, data))

If I do a test locally and trim that clutter out the git repo history and crunch (filter-data + reflog-exire + prune etc.) it the entire source repo of dave_gnukem source code becomes a nice attractive tiny less than 2MB, much nippier to work with. But if I ever had to crunch that I'd have to do a 'force push' to rewrite the history which I NEVER want to do because existing forks and other contributions may be messed with - so I will leave this repo 'as is', HOWEVER, this is just a small thing that nags me somehow. what I'm thinking is that maybe at some point I'll TRY create a new parallel new 'crunched' repo and call it 'dave_gnukem2' or something (it would start out having the exact same history as this one and same working tree as this one at that point in time EXCEPT the "datasrc/" and "data/" stuff removed from old history, e.g. contributions like Matteo's should remain present and linked correctly I think - if not I won't do it because I think the accreditation of contributors is really imoprtant) - and perhaps then ports and forks thereafter could start being based off that - but leave this dave_gnukem repo as is so it doesn't with any existing anything - but I want to make sure everybody who deserves credit is still in the repo history in the correct way - so will do tests to check before I do anything on this, if I even change it.

At that point might be a good time to switch to better e.g. autoconf or cmake tools or something to help manage dependencies etc. for more platforms than the ancient Makefile - that way port maintainers who rely on that stable crappy old literally-1998-based Makefile (this Makefile is the same age as my girlfriend) could keep using it if they want but those who want to switch to newer better build system could. That new repo could also be used for more serious refactoring to make the engine more flexible perhaps. OR: I might rename existing repo to eg dave_gnukem_old and leave the new one under same name - I'm just doing tests now - history looks fair/fine

Or 24MB of clutter in source repo history nothing to worry about? maybe. I just worry 'every repo ever created from this one ever' would have that

(ALSO FYI master branch will probably be renamed to main soon)

Thoughts?

I just want people to be aware of this also, so if/when I do try do this, you can be prepared and know why. But will test thoroughly there's nothing negative, especially in terms of user contribution accreditation in the history of eg merged pull requests, I want those to hopefully still be clear in the history - will try make sure

davidjoffe commented 1 year ago

FYI I'm doing some tests in some clone repos (to be deleted) and so far the history looks 'pretty good' except the pull requests don't show exactly right (but it does most importantly show as being merged from the actual contributing user) .. tags seem to transfer, but not release files .. issues I think are movable .. hmm, if the history looks fine maybe will indeed try do a 'push force' with a rewritten history .. at that point all SHAs change though, so it would require anyone with a cloned repo to re-get it from scratch .. anyway will think about it and decide. Apologies in advance for any inconvience this might cause for forks etc. if I do that

davidjoffe commented 1 year ago

(Something also for me to think about is that conceptually parts of the code are also meant to be 'generic' 'engine-y' while the others parts game-specific - theoretically another 2-repo split if ever were to worry seriously about that - but there are so many better larger game engines it's unclear what the value proposition would be for yet another one and I certainly don't have spare time to do a larger more generic 'game engine' copmonent)

davidjoffe commented 1 year ago

I have thought about it and decided I am going to do this, sorry for any inconvenience this causes downstream - you'll probably have to re-get/re-clone to re-setup your working tree (MAKE BACKUPS of your working trees before sorting this out! If you have lots of local changes in your work tree could be a pain)

andreaspeters commented 1 year ago

How about if you include the two data source git repositories as "git submodul"?

davidjoffe commented 1 year ago

Hmm, "datasrc" is maybe a special case as very few people would need that - e.g. artists etc. who literally use PhotoShop to edit sprites (it's not intended for general distribution in e.g. a Debian package say because most normal users who just want to play the game wouldn't really be wanting or needing to try edit the files in Photoshop?)

data is really necessary to the gameplay .. so maybe a more compelling use case for submodules .. I don't know if there may be negative effects to using submodules.

I want to though also have the freedom to potentially in future be able to use different data folders - for example, if we extend the core 'engine' to handle more games (either some other unrelated game, say, or a hypothetical 'Dave Gnukem version 2' .. then I'd still want the core game source to support the 'version 1 Dave Gnukem' game and gameplay and game data and the set of levels etc. that were regarded as 'version 1' but potentially it would then be 'detached' from that specific data folder to have some other data folder entirely ... e.g. if someone wants to use the 'engine' to build some entirely different game. Maybe not likely but I don't know.

I was thinking of adding some small helper scripts to maybe more easily git clone the data subfolder.

In future we may also have some hypothetical scenario where e.g. maybe the main source tree contains some bleeding edge breaking stuff but the main data is the stable data (or vice versa) ...

Not sure if anyone else has some thoughts? Would submodules affect things like Debian packaging or not really? I mean of course it's simple enough to 'ignore a submodule if it's present .. to be honest I haven't worked that much with submodules, so it's a bit unfamiliar to me, that's also why I haven't

davidjoffe commented 1 year ago

Of course for now the focus is mainly on consolidating and stabilizing the main 'Dave Gnukem version 1' and helping get more ports building reliably etc. (And I don't want to suddenly change sprite data or level data drastically right now because that's essentially the 'official' version 1 we released - though we could add more separate 'missions' and/or some new levels as 'bonus levels' or something in a hypothetical version 1.2 or somethign.)

davidjoffe commented 1 year ago

Anyone else have thoughts for/against submodules?

andreaspeters commented 1 year ago

Yeah, keep it so easy as possible. If you didn't work with submodules before, let it and save your time for more important things. :-) At least, you even could add the "git clone" commands into the Makefile to save some extra steps. :-)

davidjoffe commented 1 year ago

I locally started creating a little separate helper script that looks like this:

If we integrate it into Makefile though we should be careful as could cause issues with automated downstream build systems that maybe apply patches etc. to this Makefile or maybe packaging systems where it's maybe done a different way e.g. Debian .. but I was also thinking maybe the build scripts could help make it easier .. but hadn't decided ..

getdatafolder.sh:

#!/bin/sh
# dj2022 small helper script to get or update data subfolder (you need git installed for this)

djDATADIR="data"
djDATA_URL="https://github.com/davidjoffe/gnukem_data"

if [ -d "$djDATADIR" ]; then
    echo Updating data folder "${djDATADIR}" ...
    echo cd "${djDATADIR}"
    cd "${djDATADIR}"
    # show current folder
    pwd
    echo git pull
    git pull
    echo cd ..
    cd ..
else
    echo Cloning data folder ...
    git clone "${djDATA_URL}" "${djDATADIR}"
fi
davidjoffe commented 1 year ago

FYI have also now added this new helper script to the repo:

https://github.com/davidjoffe/dave_gnukem/blob/main/get_datafolders.sh