craigsapp / humlib

Humdrum data parsing library in C++
http://humlib.humdrum.org
BSD 2-Clause "Simplified" License
31 stars 8 forks source link

unknown cases for tandeminfo #98

Closed bel28kent closed 2 months ago

bel28kent commented 2 months ago

Here is a perl script to find unknown interpretations in the humdrum-data repo. I will not post the output here because it is quite verbose.

#!/usr/bin/perl

my @files = `find . -name '*krn' -print`;
chomp (@files);

foreach my $file (@files) {
  if ($file =~ m/(liber|source)/) {
    next;
  }
  my @interps = `tandeminfo -m $file | sort | uniq | awk ' /unknown/ '`;
  if ($#interps == 0) {
    next;
  } else {
    foreach my $interp (@interps) {
      printf "%-50s %s", $file, $interp;
    }
  }
}

I skip chant and source files. The source files cause duplication. The chant files seem to have the same unknown interpretations, and including them causes redundant output. Here are the unknown chant interpretations:

**kern  *MX unknown
**silbe *LLatin unknown
craigsapp commented 2 months ago

*MX means unmeasured music, which I will add to tandeminfo. Probably I will add the hypothetical *MX/4 which would mean the music is unmeasured, but the beat is a quarter note.

*LLatin means that the language of the text is Latin, but I don't know if this is documented in the User Guide/Reference books. In any case I will add them, and maybe add a deprecation comment to switch to what I use which is easier to manage and search for:

*lang:LA or *lang:LAT

Which are ISO codes for languages:

ISO 639-1: This part of the ISO 639 standard consists of two-letter codes, where LA is the code for Latin. ISO 639-2/ISO 639-3: These parts of the standard use three-letter codes, and LAT is the ISO 639-2/T and ISO 639-3 code for Latin.

craigsapp commented 2 months ago

To avoid looking into directories that start with ., the find command can be updated:

find . -path './.*' -prune -o -name '*krn' -print
craigsapp commented 2 months ago

These two interpretations should now have an identifiable meaning.

I adjusted the script:

#!/usr/bin/env perl

chomp(my @files = `find . -path './.*' -prune -o -name '*krn' -print`);

foreach my $file (@files) {
   my @interps = `tandeminfo -mu $file | sort | uniq`;
   foreach my $interp (@interps) {
      printf "%-50s %s", $file, $interp;
   }
}

Note that $#interps is the size of the array minus one, so @interps which is the size of the array is better. But it is not necessary to precheck since the foreach loop will take care of skipping zero-sized arrays.

In particular, I added a -u option to only output interpretations with unknown meanings.

I fixed some tandem interpretation problems with repositories in the humdrum-data repository, so updating it with make clean && make will download the updated scores.

Also I added a -l option that gives the row/column of the tandem interpretation (useful to combine with -u to help locate problem interpretations in a file).

craigsapp commented 2 months ago

I reversed the meaning/description option. By default the output will include the meaning/description, and you give -D (or -M) to exclude it.

So the updated PERL script for humdrum-data is:

#!/usr/bin/env perl

chomp(my @files = `find . -path './.*' -prune -o -name '*krn' -print`);

foreach my $file (@files) {
   my @interps = `tandeminfo -u $file | sort | uniq`;
   foreach my $interp (@interps) {
      printf "%-50s %s", $file, $interp;
   }
}

Other optional additional data has to be actively added:

The VHV interface to tandeminfo is working well now. Example:

https://verovio.humdrum.org/?file=poly/R409_Web-w3p7m46-49.krn&filter=tandeminfo%20-cN

Screenshot 2024-08-19 at 12 45 58 PM

The -c option is analogous to uniq on the command line, and -N is analogous to sort (with -N means sort in reverse numeric order of the tandem counts, and -n sorts from low to high counts).

Note that clicking on a row in the tandem interpretation list will move the cursor in the text editor to the location of the tandem interpretation for the row. When -c is given clicking on the row will take you to one of the matching interpretations (not all of them).