DCMLab / standards

Repository containing standards developed at the DCML. https://dcmlab.github.io/standards
4 stars 0 forks source link

How is the regex meant to be used? #46

Closed malcolmsailor closed 2 years ago

malcolmsailor commented 2 years ago

Hello!

I'm wondering how the regular expression in harmony.py is meant to be used, for instance to parse the files in https://github.com/DCMLab/ABC/tree/v2/harmonies.

The obvious thing to do with it seems to be to match lines of the .tsv files using re.match. But clearly that won't match anything because of the whitespace in the regex, the quote characters in the tsv files, and the fact that each row has many leading columns not included in the regex:

"chord" "altchord"  "measure"   "beat"  "totbeat"   "timesig"   "op"    "no"    "mov"   "length"

I could write some code to alter the existing regex to match lines of the table. But I don't want to reinvent the wheel unless necessary. So I am wondering if anyone can point me to an example of how the regex is meant to be used in conjunction with the tsv files. Thanks!

johentsch commented 2 years ago

Hi @malcolmsailor , the columns globalkey to phraseend are already the result of applying the regex to the column label. So the regex' use is to split the labels into their components. At the same time, if a label doesn't match, it's considered asyntactical. The expanded harmony tables are created using the command ms3 extract -X that is available after pip-installing ms3. It is achieved via the function expand_labels() and the actual regex matching happens in split_labels().

Please note the comment above the regex, it needs to be compiled using re.compile(regex, re.VERBOSE), which will ignore the whitespace. Otherwise it would be unreadable. Hope this helps.

malcolmsailor commented 2 years ago

Thanks! That all makes sense. I was not familiar with the re.VERBOSE flag.