christofmuc / KnobKraft-orm

The KnobKraft Orm - The free modern cross-platform MIDI Sysex Librarian
GNU Affero General Public License v3.0
197 stars 25 forks source link

Filename anomalies when skipping clones #169

Closed Mostelin closed 2 years ago

Mostelin commented 2 years ago

I loaded a large set of programs today, and the ORM found that 87 of them were duplicates, and skipped them. However, when this occurs what is left is a mix of data of the two programs. For example, duplication was detected in these two programs:

PIANO2, position 21 in file CARTALD.syx PNOVIL, position 16 in file esx-2b.syx

flagged by the message: 02:03:21: Skipping patch PIANO2 because it is a duplicate of PNOVIL

What remains after the duplication is detected is:

PNOVIL, position 16 in file CARTALD.syx

In other words, the wrong filename is applied. I've noticed a few other anomalies where the filename is incorrect, and so far all of them appear to have been involved in duplicate skipping.

christofmuc commented 2 years ago

@Mostelin Good find! If you could provide a file that produces that problem, I could have a quick look!

Mostelin commented 2 years ago

@christofmuc Sure, here is a small sample containing the two programs I cited above, and a couple of other programs. Ensoniq-programs-skipping.zip

I tried importing this sample into my current database, knowing all programs were already present. Here is the log of this:

08:16:10: Ignoring rules for category Voice, because that name is not found in the database 08:16:10: Overriding built-in import category rules with file C:\Users\Mark\AppData\Roaming\KnobKraft\mapping_categories.jsonc 08:16:10: Overriding built-in adaptation Ensoniq ESQ-1 (found in user directory C:\Users\Mark\KnobKraft-Adaptations) 08:16:11: Lost communication with Ensoniq ESQ-1 on channel 1 of device Studio 24c MIDI Out - please rerun auto-detect synths! 08:16:11: Turning off USB input Electra Controller 08:16:30: Ignoring category Acoustic Bass of patch CONTRA because it is not part of our standard categories! 08:16:30: Renaming PNOVIL with better name PIANO2 08:16:30: Updated 1 patches in the database with new names 08:16:30: All patches already known to database

A couple of things puzzle me about this:

christofmuc commented 2 years ago

Confirm bug #1: The duplicate is skipped, but the import ID of the duplicate is used in import of the PIF file. This leads to the first instance to be associated with the filename (import) of the duplicate, while the import description in the metadata still lists the original import.

christofmuc commented 2 years ago

Bug 1 fixed, indeed it build up a table of md5 to import id (filename), and did not check for duplicates there. Thus, it would use the first instance of a patch but with the last import id/filename that occured with. Fixed in next version. This is not specific to the PatchIntrchangeFile (PIF) json format, but would occur with duplicates in sysex files as well. Can't be easily fixed in an existing database, although the full correct import information is retained and displayed in the Meta data/Import field.

christofmuc commented 2 years ago

@Mostelin Let me try to sort the rest out here:

A couple of things puzzle me about this:

* Why is the category Voice mentioned in the first line? I do not have this category in either my json file or the list of 'auto-categories' that are accessible from the menu. Is it that the Voice category is present in mapping_categories.jsonc? If so, I think I will add Voice just to stop getting this message again. EDIT: I just added Voice, and that stops the message appearing.

Yes indeed, this is just a warning that the json file contains a category that is not available and thus will not be imported. If you create it before import (#168), it will be applied.

* I notice that the program PNOVIL alias PIANO2 has now changed name again. There seems to be something strange about how the name is chosen. If two patches are identical in the first import, the first name is retained, but if other identical programs are imported later, the name of the last one will be chosen. EDIT: I just tried repeating the import. On the third import the name changes again, so it is back to PNOVIL now!

That is kind of on purpose. E.g. for the Oberheim Matrix 1000, there are two different versions of the factory bank sysex files out there - one with the correct patch names, and one version as dumped from the device where the patch names are just numbers. So if you accidentally import the latter, you can fix the names by reimporting the correct files without having to start a new database or losing all your categories and favs. So more a feature than a bug, though I never tested it with renaming duplicates - but if you can alter the name with every import, this is what I intended. I think I did.

It is related to #19, importing names from files. In a way, you can already do this via this mechanism. The warning printed though gives you the change to notice what is going on.

There is even a fun extension of that logic - you can implement the isDefaultName function to indicate that one name is default and thus inferior and will always be overwritten. That way, you can easily get rid of default names like "patch #012" and such, if that is the default name generated by the synth or some other tool.

christofmuc commented 2 years ago

@Mostelin P.S. I noted in your file the program numbers are large, e.g. 5016 instead of 16. Is that on purpose?

Mostelin commented 2 years ago

@christofmuc

You asked:

@Mostelin P.S. I noted in your file the program numbers are large, e.g. 5016 instead of 16. Is that on purpose?

It is coincidental that program 5016 also happens to be the 16th program from its original bank.

I liked the fact that your software accepted two numbers, Place and SourceInfo:program, because my database records the position of a program in a bank, which I mapped to Place, and also uniquely indexes all programs with a separate key. In this case the program just happened to be the 5016th record created in my database. It is really nice to know that I can always link back from your database to mine, regardless of all the duplication of names and bank locations that exist - or any other changes that might be introduced down the track.

christofmuc commented 2 years ago

I liked the fact that your software accepted two numbers, Place and SourceInfo:program, because my database records the position of a program in a bank, which I mapped to Place, and also uniquely indexes all programs with a separate key. In this case the program just happened to be the 5016th record created in my database. It is really nice to know that I can always link back from your database to mine, regardless of all the duplication of names and bank locations that exist - or any other changes that might be introduced down the track.

Oh, we must make sure this doesn't break in the future - the original logic is that the program number in the database is the bank multiplied by bank number + patch number within bank. This way, I can store bank+program in one number.

This will get relevant with the 2.0 release which contains bank management - we need to look at it again and see how we can keep your identifier then.

Closing this ticket as the bug is fixed with release 1.16.0.