bel28kent / Mysterium

An encoding of Alexander Scriabin's solo piano music in kern
7 stars 1 forks source link

Add `!!!title: @{OTL}, op. @{OPS}, no. @{ONM}` for English titles in VHV #18

Closed craigsapp closed 12 months ago

craigsapp commented 1 year ago

It would be useful to adjust the reference records for these files:

scriabin-op19_no01.krn

!!!COM: Scriabin, Alexander
!!!OPR: Sonata No. 2
!!!OTL: Andante 
!!!OPS: 19
!!!ONM: 1
!!!ODT: 1897
!!!AGN: Sonata movement
!!!OMD: Andante.

scriabin-op19_no01.krn

!!!COM: Scriabin, Alexander
!!!OPR: Sonata No. 2
!!!OTL: Presto
!!!OPS: 19
!!!ONM: 2
!!!ODT: 1897
!!!AGN: Sonata movement
!!!OMD: Presto.

scriabin-op02_no01.krn

!!!COM: Scriabin, Alexander
!!!OPR: Trois Morceaux pour piano
!!!OTL: Etude in C# minor
!!!OPS: 2
!!!ONM: 1
!!!ODT: 1886
!!!AGN: Etude
!!!OMD: Andante.

Note: it might be nice to include the key in the title of generic titled pieces (etude, prelude, mazurka). Such as Etude in C# minor in this case.

scriabin-op2_no01.krn

!!!COM: Scriabin, Alexander
!!!OPR: Sonata No. 4
!!!OTL: Andante
!!!OPS: 30
!!!ONM: 1
!!!ODT: 1903
!!!AGN: Sonata movement
!!!OMD: Andante.

scriabin-op2_no02.krn

!!!COM: Scriabin, Alexander
!!!OPR: Sonata No. 4
!!!OTL: Prestissimo volando
!!!OPS: 30
!!!ONM: 2
!!!ODT: 1903
!!!AGN: Sonata movement
!!!OMD: Prestissimo volando.

The files scriabin-op5_no01.krn and scriabin-op5_no02.krn should be renamed to add a 0 in op5.

bel28kent commented 1 year ago

I fixed one inconsistency I noticed in my OTLs, but have left the others for now. This is an issue that I will make a decision on before I revise the corpus. I agree with you about including keys in the titles, though I think keeping information discrete and on separate lines is also useful as it gives a user the choice of what information they want. It also seems to me that having OTL above OPR can be more useful as it is visually easier to immediately see the title of the current file without having to scan multiple lines. Let me know your thoughts.

craigsapp commented 1 year ago

It also seems to me that having OTL above OPR can be more useful as it is visually easier to immediately see the title of the current file without having to scan multiple lines.

Reference records do not have a prescribed order, so there is no problem in doing that. Traditionally the larger group names come before the smaller information, but there is no special reason for this other than it matches the order it would be seen in print. For computer processing it will make no difference at all.

Here are the Humdrum files encoded at OSU by David Huron and his students:

https://kern.humdrum.org/cgi-bin/browse?l=/osu

which I use a prototypes for organizing reference records in a file. An example would be the Well-tempered Clavier, such as:

https://kern.humdrum.org/cgi-bin/ksdata?l=osu/classical/bach/wtc-1&file=wtc1f01.krn&f=kern

!!!COM: Bach, Johann Sebastian
!!!OTA: Das wohltemperirte Clavier
!!!OPR: Das wohltemperierte Klavier
!!!OTL: Fuga 1, Vol. 1
!!!XEN: The Well-Tempered Clavier, Volume 1, Fugue 1.
!!!ONB: C major, 4-part
!!!SCT: BWV 846b
!!!YEC: Copyright 1994, David Huron
!!!YEM: Rights to all derivative electronic formats reserved.
craigsapp commented 1 year ago

It is important to rename files such as scriabin-op5_no02.krn to scriabin-op05_no02.krn so that the files are in alphabetic listing order by opus when all of the files are copied into a single directory. In other words scriabin-opX_noYY.krn should be scriabin-op0X_noYY.krn.

craigsapp commented 1 year ago

Regarding titles such as "Etude", "Prelude", "Mazurka" which are generic genre titles, it is particularly useful to include the key after the generic title.

Here is the IMSLP list for the Op. 8 etudes, for example:

Screenshot 2023-05-18 at 7 07 09 AM

Otherwise it is difficult to locate the desired Etude purely by its number in the opus.

Computer reading of the title to identify genre should not be done, since the title can be spelled in several ways, such as "Study" in English, or "Étude" in French (adding the accent). Instead, the genre is encoded for computational use in the !!!AGN: record.

Here is the wikipedia entry for one of the etudes:

https://en.wikipedia.org/wiki/%C3%89tude_in_D-sharp_minor,_Op._8,_No._12_(Scriabin)

Notice that it is qualified with the key of D-sharp minor, since "Etude" by itself is ambiguous, since all of the works in Op. 8 have the same name.

The titles are for use by humans. For computer processing, the key designations are found in tandem interpretations within the data, such as *d#: in this case, so there is no standard reference record for key designations.


Related to this, there is an informal reference record called !!!title: which is used to display the title of works in VHV. VHV will display the OTL information only (plus the COM, although that is hardwired to the title display in VHV). You can construct a title other than or in addition to OTL with the !!!title record:

Here is a current view of the titles in VHV:

Screenshot 2023-05-18 at 7 20 53 AM

To enhance the title, you can add this reference record:

!!!title: @{OTL}, op. @{OPS}, no. @{ONM}
Screenshot 2023-05-18 at 7 23 03 AM

The structure @{XXX} is a template which will find the !!!XXX: reference record and insert it into the title string.

I would not bother with splitting out the key information for the title, but if you want to do that:

!!!COM: Scriabin, Alexander
!!!OTL: Etude
!!!key: D-sharp minor
!!!OPR: 12 Etudes
!!!OPS: 8
!!!ONM: 12
!!!ODT: 1894
!!!AGN: Etude
!!!title: @{OTL} no. @{ONM} in @{key}, op. @{OPS}

Which produces the most informative title in VHV:

Screenshot 2023-05-18 at 7 26 41 AM

I would put the !!!title: reference record at the bottom of the file since it is not an otherwise important reference record describing the work.

craigsapp commented 1 year ago

Related: Op. 59, no. 1 does not have a title:

!!!OTL: Poème
!!!AGN: Poem

AGN records should be standardized and not use accented characters, since they are useful for finding genres across repertories. To a certain extent it is up to the encoder, but Poem is a more useful genre name.

Similarly for Op. 59, no. 2:

!!!OTL: Prelude
!!!AGN: Prelude

The OTL is for humans to read, and the AGN is for computer processing (both should be present).

craigsapp commented 1 year ago

Also I note that Op. 8 Etudes. nos. 1–6 do not have key designations (such as *C#: for no. 1), they should be (from the IMSLP list):

  1. Etude in C-sharp major
  2. Etude in F-sharp minor
  3. Etude in B minor
  4. Etude in B major
  5. Etude in E major
  6. Etude in A major

Also preludes 4 and 14 from Op. 25 are missing key designations.

https://imslp.org/wiki/24_Preludes,_Op.11_(Scriabin,_Aleksandr)

  1. Lento in E minor
  2. Lento in D-flat major
bel28kent commented 1 year ago

File names: This issue is fixed per push on May 18.

bel28kent commented 1 year ago

Key designations: Key designations will be added to genre titles during second round of revision.

bel28kent commented 1 year ago

!!!title record: Will not add.

bel28kent commented 1 year ago

Languages in OTL and AGN: OTL titles will be updated to use Scriabin's preferred French across all files. AGN genre designations will be updated to use only English across all files. Updates will be made during second round of revision.

craigsapp commented 1 year ago

It is preferable to have AGN in English (unless the common term used in English is actually French such as Etude instead of Study (but preferably not Étude). The reason is that this field is for computer processing of the file, and in particular with a combined set of files from multiple sources. For example to count the notes in all waltzes from multiple repertories containing different primary languages.

census -k $(grep -lri "^\!\!\!AGN:.*waltz")

If you have AGN in various languages, you would miss waltzes that are not described in English. The best you could do would be to check for the various terms for waltz to identify them, and/or guess at the names used in the AGN field:

census -k $(egrep -lri "^\!\!\!:AGN:.*(waltz|walc|valse|walzer|valzer||vals|keringő)"

!!!OTL: is good for using French since this is primarily for human reading. If you want to be more explicit, then !!!OTL@@FR: means that the title is in French, and that is the primary/original language. Then !!!OTL@EN: would be the English translation of the title, etc.

bel28kent commented 1 year ago

@craigsapp

I have updated AGN (on the phase2 branch) to only use English; "Poème" is now "Poem" and "Danse" is now "Dance."

For !!!OTL@@FR, what in your opinion is the best way to encode accented characters (è, é, á, and so on). I don't think Vim even let's you type them in to a file the way that the mac UI let's you type them in a browser or document. I've also worked with files that have accented characters that don't convert properly when you convert the file (e.g., a txt file of references from PsycInfo).

craigsapp commented 1 year ago

For è, é, á, and so on should be encoded with UTF-8 (Unicode). This is the default character encoding for Macs, so you don't need to do anything special. There are both single-character forms of the letter and the accent, as well as compositing cases which are composed with two characters: such as e and `. Preferably the latter should be used (it is hard to tell the difference unless you look at the bytes for the letters, and in any case it is unlikely to create the latter while typing the letters in a Mac (but copy-and-paste from various sources on the web may insert the compositing accents).

I don't have problems typing them in vim (on mac at least, but I don't remember having problems on linux as well).

To enter in vim on macs, hold down the option key, then press the accent character and then the letter.

For example é is created by typing option+e and then releasing and typing e.

For è you type option+backquote and then e.

For ï you type `option+u and then i.

For capital letters it is exactly the same: É is `option+e and then E.

You need to be using the US keyboard setting (probably you are already doing that). And alternate way is to use the French keyboard layout (but that layout is quite different from the US layout). To allow French layout, you go to system settings and the keyboard layouts and then add a new language on the left in this window (I have US, German, and Polish keyboard layouts set up):

Screenshot 2023-07-04 at 8 30 22 PM

When you have multiple layouts, you will see a box in the top rightish side of the screen:

Screenshot 2023-07-04 at 8 31 52 PM

You can set the keyboard layout in the drop-down list that appears when you click on the language code (US in this case).

I've also worked with files that have accented characters that don't convert properly when you convert the file (e.g., a txt file of references from PsycInfo).

There can be many problems, so it is difficult to say what the problem is (unless there is a website with example text for me to copy and test what is happening. In general the problem would be that the characters are not encoded in UTF-8 (and the webpage has a line in the HTML saying what the character encoding is, but when copy-and-pasting, the literal bytes are copied instead of converting them to UTF-8.

An alternate system that I used to do is to use HTML entity encoding. I would find that acceptable still however.

     è = è
     &eacute  = é
     ï     = ï
bel28kent commented 12 months ago

Most issues fixed in some previous commits. New issue for !!!title.