Open YoshiRulz opened 1 year ago
Bonus points if you manage to not lose our change history to those projects when moving them to submodule. Splitting folder history is possible in git, but re-applying changes from a different tree (even if the code matches) may be harder.
For many of those cores/projects, they're checked in as they have been forked and modified, some fairly lightly (e.g. tic80, which mostly just has waterbox interfacing added and minor adjustments and unneeded files excised) and some very heavily (e.g. virtualjaguar, much of which is unrecognizable compared to upstream), not to mention just ones extremely outdated from upstream (e.g. gpgx). Most of them will require forking in any case and can't just have upstream submoduled.
fork in the TASEmulators org
To be clear this is also mainly the reason I don't go immediately fork in TASEmulators org for projects And getting someone to move a repo to the org is annoying. Way eaiser to just clone into the repo and call it a day, and no downsides (besides BizHawk clone size) if upstream is dead.
I didn't know lack of permissions was a roadblock for more cool work. I think that can be arranged.
To be clear this is also mainly the reason I don't go immediately fork in TASEmulators org for projects [image showing insufficient permissions]
This would also be a blocker for me. I do see the advantage in terms of organization if more stuff was in submodules, but it does require all maintainers to have access to all the relevant projects.
Cleaning up the root folder of the bizhawk repo is independent from that though, and I'd definitely approve moving stuff to a unified folder. There is ExternalCoreProjects
, why isn't it used more?
If people tell me what to fork under tasemus, I can do that and invite you guys as admins to those forks. I may also discuss adding more admins to the org itself...
Well, most of the repos I linked should be forked, but pragmatically the larger ones are far more important, and as CPP said some projects have had more extensive changes than others. IMO forks can be made "as needed" i.e. the next time someone wants to update a core.
What's the objective here; making it easier to merge upstream changes? In that case, the difficulty comes chiefly from how much the code has diverged from upstream, not the mechanism by which they're stored.
It's a tradeoff we've made different ways different times: Heavy modifications to source lets us get off the ground with something that works for us more quickly. Light modifications to source mean we have to do work that's sometimes more awkward and difficult to integrate with the existing code better, but we can take upstream changes more easily.
I don't think there's one clear answer. The cores that sync with upstream seem to be the better long-term bet, and we can see that especially in cases where we've done it both ways and can compare the two (e.g., early Mednafen efforts vs the Nyma system.) But we also have to consider that some of these ported cores might not even exist if we hadn't taken the "easy" way out.
I'm against any specific requirements here, because we're all volunteer and the most important thing is being useful for the person who will actually work on the core. I think core porters should consider the value of various approaches, but if they want to get hacking, and they're the only one who will do it, let them.
To be clear, this issue is not arguing against the heavily-modified path. I'm just against adding hundreds of files directly to this repo.
I don't understand why that's worth discussion. Apart from concerns on implementing the cores, and concerns about keeping the cores synced with upstream, why does it matter?
For one, having an extra million LOC in the repo drastically increases the time it takes to clone. That affects both humans and CI.
It also makes grep -r
, find
, etc. take longer and give more false positives.
I guess; aren't most devs going to check out most sobmodules anyway?
aren't most devs going to check out most submodules anyway?
I for one only ever checkout the submodules I require. That said, I don't think the C/C++ cores in the repo contribute much to the clone size; it's probably going to be the assets folder for the most part.
I guess; aren't most devs going to check out most sobmodules anyway?
Seeing as you don't need the submodules cloned to build the solution, and we don't really advertise their existence, I'd imagine no, most devs aren't cloning the submodules. Like Morilli, I never bother to outside of the rare occasions I need to work with them.
That said, I don't think the C/C++ cores in the repo contribute much to the clone size; it's probably going to be the assets folder for the most part.
Downloading edf5f1513 as a zip (not the same as cloning, but similar, and easier to measure):
So your guess was broadly correct. (IMO that is also a problem; Citra and MAME together make up more than half of /Assets
' filesize. See #3505.)
But if you omit /Assets
, the remaining filesize is split at about a quarter /waterbox
, a quarter /src
, a quarter /libmupen64plus
, and the long tail.
Notably, some of those largest white segments are .exe
s, > 18 MiB combined? Those sound like easy wins.
Well if we're doing this, here's a probably at least somewhat accurate and representative list of what contributes most to the .git
size:
we don't really advertise their existence
They're in .gitmodules
, what other advertising would be needed?
Anyway, that download size is awful, but the problem there is mostly unrelated to submodules and this ticket, as it's about prebuilt binaries. We'll need to get all of our crusty binary build setups into an easily reproducible build system.
placing submodules in /submodules, and supplemental files like Makefiles or shell scripts in one of /ExternalProjects, /ExternalCoreProjects, or /waterbox when those are necessary.
A minor problem with doing this specifically is cmake being used in places, and cmake does not allow including stuff in parent directories, you can only include stuff in the current directory and subdirectories. So unless you end up also having the cmakelists also be within the /submodules folder alongside the submodule, or possibly in the BizHawk root directory itself, you're screwed.
That's dumb. And I assume you need CMake on Windows too, so you can't just use a symlink, bind mount, etc. to work around it.
tl;dr: I'm calling for a moratorium on copying third-party code into the repo, and I'm proposing that we move a few dirs around.
We all hate code duplication. So why is it okay for us to needlessly copy entire codebases?
snip(Cut out a bunch of back-and-forth here since I feel none of you need to be convinced this is a good idea, only that it's worth the effort to do it. edit: Apparently natt did need convincing. Scroll down for arguments.)So let's assess the damage:
/ExternalCoreProjects/Virtu
, Virtu core, < 11 kLOC, upstream is https://github.com/digital-jellyfish/Virtu/tree/master/Virtu/ExternalProjects/NLua
, Lua host, < 10 kLOC, upstream is https://github.com/NLua/NLua/ExternalProjects/iso-parser
, disc system, < 2 kLOC, upstream was on Google Code/blip_buf
, multiple cores, < 2 kLOC, upstream was on Google Code/libmupen64plus
, Mupen64Plus and plugins, < 340 kLOC, upstreams are under https://github.com/mupen64plus/lynx
, Handy core, < 12 kLOC, upstream is in tarball (or https://github.com/TASEmulators/mednafen/tree/master/src/lynx)/psx
, Octoshock core, < 68 kLOC, upstream is in tarball (or https://github.com/TASEmulators/mednafen/tree/master/src/psx)replaced with submodule/quicknes
/waterbox/ares64/ares
, Ares64 core, < 55 kLOC, upstream is https://github.com/ares-emulator/ares/waterbox/bsnescore/bsnes
, (new) BSNES core, < 67 kLOC, upstream is https://github.com/bsnes-emu/bsnes/waterbox/gpgx
, Genplus-gx core, < 88 kLOC, upstream is https://github.com/ekeeke/Genesis-Plus-GX/waterbox/libsnes
, (old) BSNES core, < 63 kLOC, upstream is https://github.com/bsnes-emu/bsnes/waterbox/nyma/zlib
, zlib for Nyma cores, < 19 kLOC, upstream is https://github.com/madler/zlib/waterbox/picodrive
, PicoDrive core, < 79 kLOC, upstream is https://github.com/notaz/picodrive/waterbox/tic80
, TIC-80 core, < 17 kLOC, upstream is https://github.com/nesbox/TIC-80/waterbox/uzem
, Uzem core, < 5 kLOC, upstream is https://github.com/Uzebox/uzebox/tree/master/tools/uzem/waterbox/virtualjaguar/src
, Virtual Jaguar core, < 52 kLOC, upstream was on a self-hosted Git server/wonderswan
, Cygne core, < 8 kLOC, upstream is in tarball (or https://github.com/TASEmulators/mednafen/tree/master/src/wswan)In total, the upper bound is 917 kLOC of source code, severed from its trunks and waiting to bitrot. (For reference, just the BizHawk solution is ~450 kLOC, and Mesen is ~325 kLOC.) Not listed are several binaries in
/Assets/dll
which are rundeps for unmanaged cores. I'll be looking at these as part of my Nix experiments. I also discounted dependencies of these projects whenever I noticed e.g..../vendor
near the bottom of the filesize list—but they are nonetheless checked-in to this repo.I propose that from today we only use Git submodules for this purpose:
/submodules
, and supplemental files like Makefiles or shell scripts in one of/ExternalProjects
,/ExternalCoreProjects
, or/waterbox
when those are necessary.I also propose we migrate all subprojects, starting with those listed above, to the same scheme. Adding a submodule and removing the checked-in copy can be squished into a single commit.
Moved here from my personal notes #<!---->118; see also #2423, my personal notes #<!---->11, and #2312.