AxionDrak / GameCube-Backup-Manager

GameCube Backup Manager - a software to convert ISO files to Nintendont format.
MIT License
125 stars 12 forks source link

SHA-1 and MD5 Checks #119

Open RomBrz opened 1 year ago

RomBrz commented 1 year ago

Is there a way to check if my backup for a game has integrity with the hashes? I'm having some dificulties to check my games because some sites use the SHA-1 for the ISO, others for the CISO, and others can have some diferent versions. On Wii Backup Manager this is worst, as could be many types (wbfs, iso, ciso, etc).

I saw that on the "Display -> Detailed information" seems like that is the checksum from an online database.

But i miss some option like "right click on the game -> check integrity", than if is needed the app convert to a temp folder to the correct format and then calculate the checksum.

Or some major option, like "check all games" for an one time full checkup.

sjohnson1021 commented 1 year ago

:)

This is functional as of right now in my local dev branch.. (too messy to push as of now, but updates are coming soon new FileStream based transfer that supports cancel/pause/resume and calculating checksum of the source during transfer, and verifying destination (currently after write, could maybe work on doing this concurrently with the file write, but..?). This is currently used for verifying 1:1 transfers of files (scrubbed, exact, working on implementing WIT, is keeping GCIT even a good idea? need further opinions before I would consider scrapping it)

I've also updated Game.cs to pull ALL information instead of just the title (seeing as we're searching a large XML document already, and not just using titles.txt, this provides things like rom crc's and hashes if GameTDB has them. Also now reading the rom version from file so we can tell if its the FR PAL version or the ES PAL version etc, and downloading the appropriate title.

I kind of... well.. eh, I'll be real. I Rabbit-holed into reworking underlying systems to be more modular/flexible and less monolithic. This was mainly done as the current "Queue" system.. .. works? but is cardboard and duct-tape if I'm being real.. (I wrote it.. I don't feel bad downing my own code, lol). So.. updates that should've been simple tweaks and slight modifications have been delayed.. as what I have currently.. has several systems completely rewritten, new global systems in place for settings, notifications, error logging, etc.

This is.. not an ideal approach.. I know. Small, incremental improvements.. y'know the whole.. point of GitHub/version control.

What I'm seeing being asked for:

Anything I missed or you want to add?

Would you want this added as part of the "View Detailed Information" form?

RomBrz commented 1 year ago

Wow, thanks for the quick and complete awnser. For me, for real, is very confusing the difference between MD5 Checksum and SHA-1, normally the sites talks about the integrity and 1:1 retail image, but i read at some site that each ISO has it own checksum so this checks. I came at your project trying to find some app "equivalent" of the Wii Backup Manager, so there i really have these different options and for me is all confusing. I saw that the Redump.org project has hashes to verify, i will try to look at that GameTDB too.

For people that isn't that envolved, i think that the "core" of the checking is something like "the ROM that i have is the retail? Is modified? My media that i used for dumping my ISOs is OK? For example, i used the CleanRip to create my GameCube ISOs, and they use the Redump base to check integrity, i discovered that my Code Veronica CD2 didn't match the checksum, so i saw that something was dirty, i clean the media and tried again, with the ISO matching the checksum.

In your app, i found a bit confusing these informations, because, for example, right click on a game than "information" show one menu with some informations, but if you go on the top menu and then "display -> detailed information" show another kind of informations, seems like very confusing.

At that "detailed information" show some checksum, but my ISO i tried to generate a checksum and doesn't match, so i don't know if was a error during the dump or is another version.

sjohnson1021 commented 1 year ago

Leaving for my reference, will edit and update this comment shortly:

Track(s) 2 and more dumps from original media [!]
#   Sectors Size    CRC-32  MD5 SHA-1
1   712880  1459978240  4c1d3641    72c4860d8555d5e790628e348abc244d    26798080de5e5c0f154915324c5c7dd6aa36056

http://redump.org/disc/2190/

sjohnson1021 commented 1 year ago

Wow, thanks for the quick and complete awnser. For me, for real, is very confusing the difference between MD5 Checksum and SHA-1, normally the sites talks about the integrity and 1:1 retail image, but i read at some site that each ISO has it own checksum so this checks. I came at your project trying to find some app "equivalent" of the Wii Backup Manager, so there i really have these different options and for me is all confusing. I saw that the Redump.org project has hashes to verify, i will try to look at that GameTDB too.

For people that isn't that envolved, i think that the "core" of the checking is something like "the ROM that i have is the retail? Is modified? My media that i used for dumping my ISOs is OK? For example, i used the CleanRip to create my GameCube ISOs, and they use the Redump base to check integrity, i discovered that my Code Veronica CD2 didn't match the checksum, so i saw that something was dirty, i clean the media and tried again, with the ISO matching the checksum.

In your app, i found a bit confusing these informations, because, for example, right click on a game than "information" show one menu with some informations, but if you go on the top menu and then "display -> detailed information" show another kind of informations, seems like very confusing.

At that "detailed information" show some checksum, but my ISO i tried to generate a checksum and doesn't match, so i don't know if was a error during the dump or is another version.

MD5 and SHA1 are both cryptographic hash functions. A cryptographic hash function is a mathematical algorithm that takes an input of any length and produces an output of a fixed length. This output is called a hash or message digest. The hash is unique for each input, and it is very difficult to find two inputs that produce the same hash.

MD5 and SHA1 are both widely used for checksums. A checksum is a value that is calculated from a file and used to verify that the file has not been corrupted. If the checksum of a file does not match the expected checksum, then the file has been corrupted.

The main difference between MD5 and SHA1 is the length of the hash. MD5 produces a 128-bit hash, while SHA1 produces a 160-bit hash. This means that SHA1 is more secure than MD5, because it is more difficult to find two inputs that produce the same 160-bit hash than it is to find two inputs that produce the same 128-bit hash.

Some sites might prefer SHA1 over MD5 or CRC because it is more secure (in other words, less likely for two different images to have the same hash). However, MD5 is faster than SHA1, so some sites might prefer MD5 for performance reasons.

All of these hash functions can be used to verify the integrity of a file. They are all different algorithms that return a string based on the bytes of data in a file. The string is unique for each file, and it is very difficult to find two files that have the same hash.

So, honestly, sites that offer multiple hashes for the same files are kind of doubling down on verification, as you can verify with one hash, and if you still don't trust it or something (honestly, I don't see this ever realistically happening or being a problem, but.. hypothetically..) you could run it against two algorithms and verify that both hashes are identical to the known good ones.

Will look into how best to handle this in UI.. (i.e. Should we calculate all three at once, in a different form, prefer one algorithm over the other by default and allow for changing preference, with additional information or another checksum form listing all hashes available? .. Do we enable checksum verification for file transfers by default? will most people care enough to warrant the extra 2-3 seconds per file on a mid-high range modern pc? older processors will likely take longer, although with 1.36gb being the standard full size, shouldn't be too much. How do we handle transferred files when they fail the checksum, they don't match the source, so they didn't copy properly, and are corrupted in some way.. do we just delete them? or inform/warn the user and suggest they delete it?

progress-update: To the point where most individual functions work, just need to rework and clean up the new QueueManager, as I'm thinking I'm going to have to shift responsibilities, to different portions of code, merge a few things, so that moving pieces fit better together, and don't have to pass around information as much, (was starting to get messy, and overly complex)

Vinfall commented 1 year ago

A bit late, but here are my two cents regarding rom validation.

TL;DR: just use CRC for speed or MD5 for security (yes, you read it right), and scroll down for some Q&A for your reference.


CRC/MD5/SHA-1 all work fine and the obscure insecurity is almost in theory (in terms of game ROMs). It's possible to fake a file with the same SHA-1 and its cost is reducing everyday (like $100k within 10 days), but that's mostly for small files. GameCube & Wii discs are surely much harder to get faked this way, if feasible at all.

In practice, as references let me just list a few sites utilizing GC & Wii hashes (they offer size info as well, but that alone can't tell anything, hence ignored):

Hash Value
CRC32: 7b595396 
MD5: 027171c92d603ae4c90d69956d0bd94a
SHA-1: ba44bbe5780c34d2a6ebd66eeed52eaaab941468
SHA-256: 61482c63289902767598a2f0bd8ebf44992190186fcd4e5e38301e7266f7c688
# Sectors Size CRC-32 MD5 SHA-1
1 712880 1459978240 5365c84b 0e63d4223b01d9aba596259dc155a174 d4e70c064cc714ba8400a849cf299dbd1aa326fc
version 1.02
size 1459978240
crc 5365c84b
md5 0e63d4223b01d9aba596259dc155a174
sha1 d4e70c064cc714ba8400a849cf299dbd1aa326fc
Languages Size CRC32 Region Serial
English, Japanese 1393M 5365C84B USA GALE01

It's up to you for the judgement. I'm just offering the info and suggesting using CRC/MD5.


Should we calculate all three at once, in a different form, prefer one algorithm over the other by default and allow for changing preference, with additional information or another checksum form listing all hashes available?

No, by default verify CRC or MD5 is a better idea IMO and users can optionally enable other algoes in settings.

Do we enable checksum verification for file transfers by default? will most people care enough to warrant the extra 2-3 seconds per file on a mid-high range modern pc?

Yes if the default is CRC/MD5. Personally I don't care but please keep in mind that some people use relatively low-end PC or NAS for just storing the ROMs.

How do we handle transferred files when they fail the checksum, they don't match the source, so they didn't copy properly, and are corrupted in some way.. do we just delete them? or inform/warn the user and suggest they delete it?

No, please don't delete them. Bandwidth/network are not always available in some places. Probably throw a warning in logs and have obvious hints in main UI.