andrebrait / 1g1r-romset-generator

A small utility that uses No-Intro DATs to generate 1G1R ROM sets
GNU General Public License v3.0
213 stars 20 forks source link

Match by hash in addition to filename #8

Closed ryanfb closed 4 years ago

ryanfb commented 4 years ago

The first time I tried this utility, I was curious as to why it didn't find any matches, since DAT files have hash information—it seems like it only uses filenames to find matches in the DAT file. I looked briefly at just trying to make a pull request for this, but I'm not sure if you think it would be out of scope. Python does have SHA1 functionality in hashlib, so it shouldn't introduce an external dependency.

In the meantime, I've written a very simple companion utility that will use the SHA1 hashes in a DAT to copy files into a new directory with the correct filenames that this utility expects: https://github.com/ryanfb/copydatrom

andrebrait commented 4 years ago

First things first: your utility's name is awesome :sunglasses:

Yes, you're right. It only matches file names and that's all. It's not out of scope per se, but it would indeed be an expansion over what this tool aimed to achieve in the first place. Which is not bad at all (feel free to open a PR if you want :smile:).

Now, yes, I have thought about matching with the hash (and I even attempted something). The only issue is that there's a huge variety of ways people can organize and archive ROMs and I found it to be a bit hard to code something that would allow most people to use it. As far as I could see, it would have to:

  1. Be able to deal with:
    1. ZIP files (ClrMamePro style)
      1. With one ROM file inside
      2. With multiple ROM files inside (for games with multiple files, like PSX ones)
    2. Uncompressed ROMs inside a single directory
      1. Possibility of user keeping games with multiple files in the same directory
      2. Possibility of the user keeping games with multiple files in folders
    3. One-folder-per-game (ClrMamePro style)
      1. With one uncompressed ROM file inside
      2. With multiple uncompressed ROM files inside (for games with multiple files, like PSX ones)
  2. Still be relatively fast (so no O(n^2) scanning, I think)

While this is all easy to do (except maybe ZIP files), it takes some testing to get it right, and I lack the time right now.

andrebrait commented 4 years ago

Well, I kinda just did it.

It was easier than I thought, tbh. I'll commit the changes in a bit

andrebrait commented 4 years ago

Well, would you do the honors of testing what I made? :wink: You should use the --use-hashes option.

ryanfb commented 4 years ago

I get the following error:

Traceback (most recent call last):
  File "generate.py", line 1224, in <module>
    main(sys.argv[1:])
  File "generate.py", line 832, in main
    file = file_relative_to_input(file, input_dir)
  File "generate.py", line 939, in file_relative_to_input
    return file.replace(input_dir, '', count=1).lstrip(os.path.sep)
TypeError: replace() takes no keyword arguments

Running under Python 3.8.0.

andrebrait commented 4 years ago

@ryanfb it should work now

ryanfb commented 4 years ago

Thanks! Seems to work fine now.

andrebrait commented 4 years ago

I have refined the hash processing and sped it up too. And I also fixed a couple issues with the copied files's names.

I released it as 1.6.0