dipzza / ultrastar-song2txt

Tools that automate parts of making a song in the ultrastar txt format
GNU Affero General Public License v3.0
1 stars 0 forks source link

Design decisions for metadata #46

Closed dipzza closed 1 year ago

dipzza commented 2 years ago

Related to #37 we need a data structure to save song metadata, useful for pitch estimation tasks related to #7 and for automatically obtained metadata for #8.

What do we need the structure for?

Which Python structure should it use then?

How should we define the structure?

Should we implement some related behavior?

dipzza commented 2 years ago

Which Python structure should it use then?

A dictionary is simple for a TAG key, value correspondence, but it doesn't allow for input validation and it doesn't provide the most intuitive access as you have to know the TAGs by memory or look for them on documentation.

A classic class with one attribute for each tag and it's value could satisfy all requirements giving it some thought.

Whether the objects should be inmutable or mutable, immutability seems a slightly better option. We don't need to modify the song metadata frequently, so it's not going to affect performance. Even if we are not gonna reuse copies of inmutable objects we have the advantage of a less complicated object, and a less error-prone code as we know we can use and pass the object without being afraid of it being modified.

How should we define the structure?

To improve readability and greatly reduce the code required (and therefore, the number of places where you can make a mistake) defining it with code generation is desired.

Specifically, for this type of structure in Python, we can use dataclasses. With them, we can add python special methods for construction, representation of the object values, equality comparison, hashing and ensure the immutability of the object automatically, from just a list of type-hinted class attributes with an optional default value.

Should we implement some related behavior?

To validate input data, dataclasses provide an special method ¨__post_init__" which executes right after the automatically generated constructor. Inside this method we can easily check all attributes are correctly initialized and raise exceptions if not.

For generating the equivalent string representation for the UltraStar txt format implementing the special method "str" is appropriate, as we already have the automatically generated 'repr" method for a more direct string representation.

Finally, how do we solve the TAG correspondence?

We could just check with one condition for each TAG and then utilize the proper attribute or vice versa. This however adds a massive amount of very similar code breaking the DRY principle.

Another option is to define a dictionary with attributes names and TAGs, this reduces the amount of code but requires the dictionary to be manually updated to contain the correct attributes names and TAGs, which is not ideal.

The approach which i think is best is naming the attributes the same as the TAGs but in lowercase. This way we don't need to maintain anything extra to know the correspondence. The only disadvantage is that we can't choose more descriptive names for attributes, but tag names are descriptive enough.