cuthbertLab / music21

music21 is a Toolkit for Computational Musicology
https://www.music21.org/
Other
2.11k stars 399 forks source link

Instrument name lookup lists #1064

Closed MarkGotham closed 2 years ago

MarkGotham commented 3 years ago

Rather than adding ever more instruments to the instrumentLookup.py lists along with each possible small variant (e.g. singular vs plural), might a better approach be to add a multiLingualLookUps with basic regex that deal with most of the common instruments and variants at once?

Here's an example of how such a list might look for voices ...

('alt(o(s)?)?', 'alto'),  # ENG, FR, DE: alt, alto, altos
('bariton(e(s)?)?', 'baritone'),  # ENG, FR, DE (sing): bariton, baritone, baritones
('baritöne', 'baritone'),  # DE (pl)
('bass(e(s)?)?', 'bass'),  # ENG, FR: bass, basse, basses
('sopran(o(s)?)?', 'soprano'),  # ENG, FR, DE
('t(e|é)nors?', 'tenor'),  # ENG, FR (sing / pl); DE (sing)
('tenöre', 'tenor'),  # DE (pl)
('canto', 'voice'),  # IT
('chant(eu(r|se)?)?', 'voice'),  # FR
('gesang', 'voice'),  # DE
('(sing)?stimmen?', 'voice'),  # DE: stimme, stimmen, singstimme, singstimmen
('voc(a|e)', 'voice'),  # IT
('voi(x|ce)', 'voice'),  # ENG, FR

... paired with a simple function like ...

def multiLingualToBestName(query_string: str):
    for pattern, value in multiLingualLookUps:
        if re.search(pattern, query_string.lower()):
            return value
    return None

That would go in instrumentLookup.py, and need a couple of lines in instrument.py (before the others) ...

bestName = instrumentLookup.multiLingualToBestName(substring)
if not bestName:
    bestName = instrumentLookup.allToBestName[substring.lower()]

In the first instance, I'd suggest a simple form of this, replacing some (but perhaps not all) of the corresponding names from their respectively language lists. I.e. we can keep both initially to check things still work, and (assuming so) gradually transition over. As an additional check, the current, long lists could form the basis of more extensive test cases.

Then assuming that all works as expected, in a future PR, we can try to be more comprehensive:

Ready to commit this for an initial sample of cases, if you approve the approach. If not, I'll just pile on some more instruments and variants!

MarkGotham commented 3 years ago

Also, do we really need the intermediary step of a 'bestName' or could we just go straight to the InstrumentClass, cutting out the intervening bestNameToInstrumentClass step? Apart from the added simplicity, it would also remove the problematically titled 'bestName' (where 'best' really means one standard English-language spelling).

MarkGotham commented 3 years ago

i.e. Currently ...

englishToBestName = {
    'accordion': 'accordion',
...

... then ...

bestNameToInstrumentClass = {
    'accordion': 'Accordion',
...

Proposed:

englishToInstrumentClass = {
    'accordion': 'Accordion',
...
mscuthbert commented 3 years ago

Searching on regular expressions can get extremely slow and has recently been shown to have potential for serious denial of service errors (see the JS problems with .endswith()). We also can't use regexs in "in" expressions to find things like "clarinet(te)?[ns]? in A" Saving some bytes would be good, but there's a lot of added complexity that might come from this, which could be a maintenance burden. I'm not negative on it, but it's not a straightforward gain.

jacobtylerwalls commented 3 years ago

Ready to commit this for an initial sample of cases, if you approve the approach. If not, I'll just pile on some more instruments and variants!

Thanks for contributing more instrument names! 👍🏻

Also, do we really need the intermediary step of a 'bestName' or could we just go straight to the InstrumentClass, cutting out the intervening bestNameToInstrumentClass step?

If you wanted to demo this (for voices only, at first, as you suggest) to see if worth continuing with, I would be +1 for the reason that mapping straight to the Python class names instead of strings, could lead to (someday!) removing this:

https://github.com/cuthbertLab/music21/blob/902ea05d559a3443e5857dc973dd9580fe47ce85/music21/instrument.py#L2404-L2411

MarkGotham commented 3 years ago

Thanks for the thoughts, Jacob and Myke.

I understand (and am not all that surprised by) the disinclination to regex. I also see that this is not a high priority issue. All the same, we seem to have a couple of solid reasons for considering a change here (excluding the question of a few bytes!). I'll commit a PR with a few names for now as promised, and we can continue to discuss an alternative layout here.

First, mapping all the covered possibilities from an alphabetical list to the single, rationalised English language version means it's pretty dispersed. For instance, say we want to map to the class Banjo not only banjo but also large banjo and huge banjo: they'll appear in three places. And that's language dependent, based on adjectival position, e.g. bass clarinet in English, but clarinette basse in French.

Wouldn't it be easier to have this flipped around so users can easily see all the options that lead to a single instrument at once? E.g. with in the format: (className, (tuple, of, variants)). It's currently cumbersome to make that check.

Those variants could be rigorously and consistently structured, e.g. (English singular, English plural, French singular, French plural, ...) in the same way that the pitch class sets are: i.e. a de facto table stored in tuples instead of a tsv (for instance).

Better still, any flip around of that kind would make it easier to address the risky lines that @jacobtylerwalls mentions. Most simply we could simply store those options in the class, perhaps with thisInstrument.multiLingualNameOptions = [list, of, variants]. Then the relevant part of instrument.fromString could simply look over listOfInstruments (default = all but can be set to only consider a smaller subset, e.g. of voices or string instruments), then if name in thisInstrument.multiLingualNameOptions, return the class.

Of course, there are many possible slight alternatives to this, e.g. having separate attributes for thisInstrument.namesInFrench = list Or even thisInstrument.pluralFrench = str.

In any case, this would probably ultimately mean removing the instrumentLookups.py file. Given that it's currently _DOC_IGNORE_MODULE_OR_PACKAGE = True, I'm guessing that's desirable?

Thanks for considering this. As always, I'm open to ideas and apologise if I've missed some reason why it's better as is.

MarkGotham commented 2 years ago

Hi @jacobtylerwalls, @mscuthbert, all.

Returning to this, here's a proposal for a simple form of what we seem to have broadly settled on above:

How about a first PR that simply re-structures the current data in that way: i.e. all and only the currently included class names and string mappings.

The list below sets out one such implementation for re-organising all and only the current data (various alternative are discussed in the comments above, e.g. simple list of lists). One of the main benefits is that this makes the omissions obvious ... but we'd hold off doing anything about filling in those gaps for a separate PR.

Two final question/comments:

Grateful as ever for your thoughts. Thanks.

[

{'m21ClassName': 'Accordion', 'English': ['accordion'], 'French': ['accordéon'], 'German': ['akkordeon', 'handharmonika', 'ziehharmonika'], 'Italian': ['fisarmonica'], 'Russian': [], 'Spanish': ['acordeón'], 'Abbreviation': ['acc', 'accdn']} 

{'m21ClassName': 'AcousticBass', 'English': ['acoustic bass'], 'French': ['basse acoustique'], 'German': ['akustik-bass'], 'Italian': [], 'Russian': [], 'Spanish': ['bajo acústico'], 'Abbreviation': ['ac b']} 

{'m21ClassName': 'AcousticGuitar', 'English': ['acoustic guitar', 'guitar'], 'French': ['guitare', 'guitarre', 'guitare acoustique'], 'German': ['akustikgitarre', 'gitarre', 'guitarre'], 'Italian': ['chitarra', 'chitarra acustica'], 'Russian': [], 'Spanish': ['guitarra', 'guitarra acústica'], 'Abbreviation': ['ac gtr']} 

{'m21ClassName': 'Agogo', 'English': ['agogo'], 'French': [], 'German': [], 'Italian': [], 'Russian': [], 'Spanish': [], 'Abbreviation': []} 

{'m21ClassName': 'Alto', 'English': ['alto'], 'French': ['contralto'], 'German': ['alt'], 'Italian': ['contralto'], 'Russian': ["al't", "kontral'to"], 'Spanish': ['contralto'], 'Abbreviation': []} 

{'m21ClassName': 'AltoSaxophone', 'English': ['alto saxophone'], 'French': ['saxophon alto', 'saxophone alto'], 'German': ['alt-saxophon', 'altsaxophon'], 'Italian': ['sassofono alto', 'sassofono contralto', 'saxofono alto', 'saxofono contralto'], 'Russian': [], 'Spanish': ['saxofón alto', 'saxofóno alto'], 'Abbreviation': ['a sax', 'sax a']} 

{'m21ClassName': 'Bagpipes', 'English': ['bagpipe', 'bagpipes'], 'French': ['cornemuse'], 'German': ['dudelsack'], 'Italian': ['cornamuse'], 'Russian': [], 'Spanish': ['gaita'], 'Abbreviation': ['bag']} 

{'m21ClassName': 'Banjo', 'English': ['banjo'], 'French': [], 'German': [], 'Italian': [], 'Russian': [], 'Spanish': [], 'Abbreviation': ['bj', 'bjo']} 

{'m21ClassName': 'Baritone', 'English': ['baritone'], 'French': ['bariton', 'baryton'], 'German': ['bariton'], 'Italian': ['baritono'], 'Russian': ['bariton'], 'Spanish': ['barítono'], 'Abbreviation': ['bar']} 

{'m21ClassName': 'BaritoneSaxophone', 'English': ['baritone saxophone'], 'French': ['saxophone baryton'], 'German': ['baritonsaxophon'], 'Italian': ['sassofono baritono', 'saxofono baritono'], 'Russian': [], 'Spanish': ['saxofón del barítono', 'saxofóno barítono'], 'Abbreviation': ['bar sax']} 

{'m21ClassName': 'Bass', 'English': ['bass'], 'French': ['basse'], 'German': [], 'Italian': ['basso'], 'Russian': ['bas'], 'Spanish': ['bajo'], 'Abbreviation': []} 

{'m21ClassName': 'BassClarinet', 'English': ['bass clarinet'], 'French': ['clarinette basse'], 'German': ['bass-klarinette', 'bassklarinette'], 'Italian': ['clarinetto basso'], 'Russian': ['bass-klarnet'], 'Spanish': ['clarinete bajo'], 'Abbreviation': ['bcl', 'b cl', 'bkl', 'bs cl']} 

{'m21ClassName': 'BassDrum', 'English': ['bass drum', 'turkish drum'], 'French': ['grosse caisse', 'tambour bata'], 'German': ['bass-drum', 'grosse trommel'], 'Italian': ['cassa', 'gran cassa', 'grancassa', 'tamborone', 'tamburo grande', 'tamburo grosso'], 'Russian': ["bol'shoi baraban"], 'Spanish': ['bombo', 'gran caja'], 'Abbreviation': ['b dr', 'cr tr', 'g c', 'gr cassa']} 

{'m21ClassName': 'BassTrombone', 'English': ['bass trombone'], 'French': ['trombone basse'], 'German': ['bass-posaune', 'bassposaune'], 'Italian': ['trombone basso'], 'Russian': [], 'Spanish': [], 'Abbreviation': []} 

{'m21ClassName': 'Bassoon', 'English': ['bassoon'], 'French': ['basson'], 'German': ['fagott', 'fagotten'], 'Italian': ['fagotto'], 'Russian': ['fagot'], 'Spanish': ['fagot'], 'Abbreviation': ['bn', 'bs', 'bsn', 'bssn', 'fag', 'fg']} 

{'m21ClassName': 'BongoDrums', 'English': ['bongo drums'], 'French': ['bongos', 'tambours bongo'], 'German': ['bongo-trommeln', 'bongos'], 'Italian': ['bonghi', 'bongos', 'tamburi bongo'], 'Russian': [], 'Spanish': ['bongo tambores', 'bongos'], 'Abbreviation': ['bgo dr']} 

{'m21ClassName': 'BrassInstrument', 'English': [], 'French': [], 'German': [], 'Italian': [], 'Russian': [], 'Spanish': [], 'Abbreviation': []} 

{'m21ClassName': 'Castanets', 'English': ['castanets'], 'French': ['castagnettes'], 'German': ['kastagnetten'], 'Italian': ['castagnette', 'nacchere'], 'Russian': [], 'Spanish': ['castañuelas'], 'Abbreviation': ['cas', 'casts', 'kas']} 

{'m21ClassName': 'Celesta', 'English': ['celesta', 'celeste'], 'French': ['célesta'], 'German': [], 'Italian': ['celeste'], 'Russian': ['chelesta'], 'Spanish': [], 'Abbreviation': ['cel', 'clst']} 

{'m21ClassName': 'Choir', 'English': [], 'French': [], 'German': [], 'Italian': [], 'Russian': [], 'Spanish': [], 'Abbreviation': ['ch']} 

{'m21ClassName': 'ChurchBells', 'English': [], 'French': [], 'German': [], 'Italian': [], 'Russian': [], 'Spanish': [], 'Abbreviation': []} 

{'m21ClassName': 'Clarinet', 'English': ['clarinet', 'clarinets'], 'French': ['clarinette'], 'German': ['klarinette', 'klarinetten'], 'Italian': ['clarinetti', 'clarinetti bassi', 'clarinetto'], 'Russian': ['klarnet'], 'Spanish': ['clarinete'], 'Abbreviation': ['cl', 'kl']} 

{'m21ClassName': 'Clavichord', 'English': ['clavichord'], 'French': ['clavicorde'], 'German': ['klavichord'], 'Italian': ['clavicordo'], 'Russian': ['klavikord'], 'Spanish': ['clavicordio'], 'Abbreviation': ['clv', 'clvd']} 

{'m21ClassName': 'Conductor', 'English': [], 'French': [], 'German': [], 'Italian': [], 'Russian': [], 'Spanish': [], 'Abbreviation': []} 

{'m21ClassName': 'CongaDrum', 'English': ['conga drum', 'tumbadora'], 'French': ['conga', 'tambour congo', 'tumbadora'], 'German': ['conga', 'conga-trommel', 'tumba'], 'Italian': ['conga', 'tumba'], 'Russian': [], 'Spanish': ['conga', 'congas', 'tumbadora'], 'Abbreviation': ['cga dr']} 

{'m21ClassName': 'Contrabass', 'English': ['contrabass'], 'French': ['contrebasse'], 'German': ['kontrabass'], 'Italian': ['contrabbasso'], 'Russian': [], 'Spanish': ['contrabajo'], 'Abbreviation': ['cb']} 

{'m21ClassName': 'Contrabassoon', 'English': [], 'French': [], 'German': [], 'Italian': [], 'Russian': [], 'Spanish': [], 'Abbreviation': []} 

{'m21ClassName': 'Cowbell', 'English': [], 'French': [], 'German': [], 'Italian': [], 'Russian': [], 'Spanish': [], 'Abbreviation': []} 

{'m21ClassName': 'CrashCymbals', 'English': ['crash cymbals'], 'French': ['crash', 'cymbales'], 'German': ['becken gewönlich', 'becken-paar', 'crash-becken', 'crashbecken'], 'Italian': ['cinelli', 'piatti', 'piatti di crash'], 'Russian': [], 'Spanish': ['platillos crash', 'platillos de choque'], 'Abbreviation': ['cym']} 

{'m21ClassName': 'Cymbals', 'English': [], 'French': [], 'German': [], 'Italian': [], 'Russian': [], 'Spanish': [], 'Abbreviation': []} 

{'m21ClassName': 'Dulcimer', 'English': ['dulcimer'], 'French': ['tympanon'], 'German': ['hackbrett'], 'Italian': ['salterio'], 'Russian': ['tsimbaly'], 'Spanish': ['dulcema'], 'Abbreviation': []} 

{'m21ClassName': 'ElectricBass', 'English': ['electric bass'], 'French': ['basse électrique'], 'German': ['e-bass'], 'Italian': ['basso elettrico'], 'Russian': [], 'Spanish': ['bajo eléctrico'], 'Abbreviation': ['elec b']} 

{'m21ClassName': 'ElectricGuitar', 'English': ['electric guitar'], 'French': ['guitare électrique', 'guitarre électrique'], 'German': ['e-gitarre', 'elektrische gitarre'], 'Italian': ['chitarra elettrica'], 'Russian': [], 'Spanish': ['guitarra eléctrica'], 'Abbreviation': ['e gtr', 'elec gtr']} 

{'m21ClassName': 'ElectricOrgan', 'English': ['electric organ'], 'French': ['orgue électrique'], 'German': ['elektrische orgel'], 'Italian': ['organo elettrico'], 'Russian': [], 'Spanish': ['órgano eléctrico'], 'Abbreviation': ['elec org']} 

{'m21ClassName': 'ElectricPiano', 'English': [], 'French': [], 'German': [], 'Italian': [], 'Russian': [], 'Spanish': [], 'Abbreviation': []} 

{'m21ClassName': 'EnglishHorn', 'English': ['cor anglais', 'english horn', 'english horns'], 'French': ['cor anglais'], 'German': ['englisch-horn', 'englischhorn'], 'Italian': ['corno inglese'], 'Russian': ['angliiskii rozhok'], 'Spanish': ['corneta inglesa', 'corno', 'corno inglés', 'cuerno inglés'], 'Abbreviation': ['cor ang', 'e hn', 'e h', 'eng hn']} 

{'m21ClassName': 'FingerCymbals', 'English': ['finger cymbals', 'zills', 'zils'], 'French': ['cymbales digitales', 'sagates', 'sagattes', 'zill'], 'German': ['fingerzimbeln'], 'Italian': ['cimbalini', 'dita piatti'], 'Russian': [], 'Spanish': ['chinchines', 'crótalos'], 'Abbreviation': ['fing cym']} 

{'m21ClassName': 'Flute', 'English': ['flute', 'flutes', 'transverse flute'], 'French': ['flûte', 'flûte traversière', 'grande flûte'], 'German': ['flöte', 'querflöte'], 'Italian': ['flauto', 'flauto traverso'], 'Russian': ['fleita'], 'Spanish': ['flauta', 'flauta de boehm', 'flauta de concierto', 'flauta traversa', 'flauta travesera'], 'Abbreviation': ['fl']} 

{'m21ClassName': 'FretlessBass', 'English': ['fretless bass'], 'French': ['basse fretless'], 'German': [], 'Italian': [], 'Russian': [], 'Spanish': ['fretless'], 'Abbreviation': []} 

{'m21ClassName': 'Glockenspiel', 'English': ['bell lira', 'bell lyre', 'chimes', 'glockenspiel', 'orchestra bells'], 'French': ['jeu de timbres'], 'German': ['lyra'], 'Italian': ['campanelli', 'metallofono'], 'Russian': ["kolokol'chiki"], 'Spanish': ['campanólogo', 'de timbres', 'juego', 'juego de timbres', 'liro', 'órgano de', 'órgano de campanas'], 'Abbreviation': ['glck', 'glock', 'glsp', 'gsp']} 

{'m21ClassName': 'Gong', 'English': ['gong'], 'French': [], 'German': [], 'Italian': [], 'Russian': [], 'Spanish': [], 'Abbreviation': ['gng', 'tamtam']} 

{'m21ClassName': 'Guitar', 'English': [], 'French': [], 'German': [], 'Italian': [], 'Russian': [], 'Spanish': [], 'Abbreviation': []} 

{'m21ClassName': 'Handbells', 'English': ['handbells'], 'French': ['clochettes', 'clochettes ‡ main'], 'German': ['handglocken'], 'Italian': ['campanelli a mano'], 'Russian': [], 'Spanish': ['campanas de mano'], 'Abbreviation': []} 

{'m21ClassName': 'Harmonica', 'English': ['harmonica', 'mouth organ'], 'French': [], 'German': ['mundharmonika'], 'Italian': ['armonica', 'armonica a bocca'], 'Russian': [], 'Spanish': ['armónica de boca', 'harmónica'], 'Abbreviation': ['hmca']} 

{'m21ClassName': 'Harp', 'English': ['harp'], 'French': ['harpe'], 'German': ['harfe'], 'Italian': ['arpa', 'arpe'], 'Russian': ['arfa'], 'Spanish': ['arpa'], 'Abbreviation': ['arp', 'hp', 'hpe', 'hrp']} 

{'m21ClassName': 'Harpsichord', 'English': ['harpsichord'], 'French': ['clavecin', 'clavessin', 'claveçin'], 'German': ['arpicordo', 'cembalo', 'clavicembalo', 'clavicimbel', 'kielflügel'], 'Italian': ['arpicordo', 'cembalo', 'cimbalo', 'clavicembalo'], 'Russian': ['chembalo', 'klavesin'], 'Spanish': ['clave', 'clavecémbalo', 'clavecín', 'clavicémbalo', 'clavicímbalo', 'cémbalo', 'gravicémbalo'], 'Abbreviation': ['hpd', 'hpschd']} 

{'m21ClassName': 'HiHatCymbal', 'English': ['hi-hat cymbal'], 'French': ['salut-chapeau cymbale'], 'German': ['hallo-hat-becken'], 'Italian': ['hi-hat piatto'], 'Russian': [], 'Spanish': ['platillo hi-hat'], 'Abbreviation': []} 

{'m21ClassName': 'Horn', 'English': ['horn'], 'French': ['cor', 'corne'], 'German': ['ventilhorn'], 'Italian': ['corno'], 'Russian': ['gorn', 'rog', 'rozhok'], 'Spanish': ['corno francés', 'cuerno', 'trompa'], 'Abbreviation': ['cor', 'hn']} 

{'m21ClassName': 'Kalimba', 'English': ['kalimba'], 'French': [], 'German': [], 'Italian': [], 'Russian': [], 'Spanish': [], 'Abbreviation': ['kal']} 

{'m21ClassName': 'KeyboardInstrument', 'English': [], 'French': [], 'German': [], 'Italian': [], 'Russian': [], 'Spanish': [], 'Abbreviation': []} 

{'m21ClassName': 'Koto', 'English': ['koto'], 'French': [], 'German': [], 'Italian': [], 'Russian': [], 'Spanish': [], 'Abbreviation': []} 

{'m21ClassName': 'Lute', 'English': [], 'French': [], 'German': [], 'Italian': [], 'Russian': [], 'Spanish': [], 'Abbreviation': []} 

{'m21ClassName': 'Mandolin', 'English': ['mandolin'], 'French': ['mandoline'], 'German': ['mandoline'], 'Italian': ['mandolino'], 'Russian': ['mandolina'], 'Spanish': ['mandolina'], 'Abbreviation': ['mand', 'mdln']} 

{'m21ClassName': 'Maracas', 'English': ['maracas'], 'French': [], 'German': [], 'Italian': [], 'Russian': [], 'Spanish': [], 'Abbreviation': []} 

{'m21ClassName': 'Marimba', 'English': ['marimba'], 'French': [], 'German': ['marimbaphon'], 'Italian': [], 'Russian': [], 'Spanish': [], 'Abbreviation': ['mar']} 

{'m21ClassName': 'MezzoSoprano', 'English': ['mezzo-soprano'], 'French': [], 'German': ['mezzosopran'], 'Italian': ['mezzosoprano'], 'Russian': [], 'Spanish': ['mezzosoprano'], 'Abbreviation': ['mez', 'mezz', 'mz']} 

{'m21ClassName': 'Oboe', 'English': ['oboe', 'oboes'], 'French': ['hautbois'], 'German': ['hoboe', 'oboen'], 'Italian': [], 'Russian': ['goboi'], 'Spanish': [], 'Abbreviation': ['hb', 'ob']} 

{'m21ClassName': 'Ocarina', 'English': ['ocarina'], 'French': [], 'German': ['okarina'], 'Italian': [], 'Russian': [], 'Spanish': [], 'Abbreviation': ['oc']} 

{'m21ClassName': 'Organ', 'English': [], 'French': [], 'German': [], 'Italian': ['organi'], 'Russian': [], 'Spanish': [], 'Abbreviation': []} 

{'m21ClassName': 'PanFlute', 'English': ['pan flute', 'pan pipe', 'panflute', 'panpipes'], 'French': ['flûte de pan', 'syrinx'], 'German': ['hirtenflöte', 'panflöte', 'papagenopfeife', 'syrinx'], 'Italian': ['flauto di pan', 'siringa'], 'Russian': [], 'Spanish': ['flauta de pan', 'flautas de pan', 'siringa', 'zampoñas'], 'Abbreviation': ['p fl']} 

{'m21ClassName': 'Percussion', 'English': [], 'French': [], 'German': [], 'Italian': [], 'Russian': [], 'Spanish': [], 'Abbreviation': []} 

{'m21ClassName': 'Piano', 'English': ['piano', 'pianoforte'], 'French': ['pianoforte'], 'German': ['klavier', 'pianoforte'], 'Italian': ['pianoforte'], 'Russian': ["fortep'iano"], 'Spanish': [], 'Abbreviation': ['pf', 'pfte', 'pno']} 

{'m21ClassName': 'Piccolo', 'English': ['octave flute', 'piccolo'], 'French': ['flûte piccolo', 'petite flûte'], 'German': ['kleine flöte', 'octavflöte', 'pickelflöte', 'pikkolo', 'pikkoloflöte'], 'Italian': ['flauto piccolo', 'ottavino'], 'Russian': ['fleita pikkolo', 'malaia fleita', 'pikkolo'], 'Spanish': ['flauta piccolo', 'flautín', 'octavillo', 'ottavino'], 'Abbreviation': ['pic', 'picc']} 

{'m21ClassName': 'PipeOrgan', 'English': ['pipe organ'], 'French': ['orgue à tuyaux'], 'German': ['pfeifenorgel'], 'Italian': ['organo', 'organo a canne'], 'Russian': [], 'Spanish': ['órgano de tubos'], 'Abbreviation': ['p org']} 

{'m21ClassName': 'PitchedPercussion', 'English': [], 'French': [], 'German': [], 'Italian': [], 'Russian': [], 'Spanish': [], 'Abbreviation': []} 

{'m21ClassName': 'Ratchet', 'English': ['ratchet', 'rattle'], 'French': ['crécelle', 'rochet'], 'German': ['knarre', 'ratsche', 'schnarre'], 'Italian': ['cricchetto', 'raganella'], 'Russian': [], 'Spanish': ['carraca', 'matraca', 'trinquete'], 'Abbreviation': []} 

{'m21ClassName': 'Recorder', 'English': ['recorder'], 'French': ['droite', 'enregistreur', 'flûte douce', 'flûte droite', 'flûte à bec'], 'German': ['beckflöte', 'blockflöte', 'schnabelflöte'], 'Italian': ['a becco', 'dritto', 'flauto a becco', 'flauto diritto', 'flauto dolce', 'flauto dritto', 'registratore'], 'Russian': ['blokfleita'], 'Spanish': ['de pico', 'dulce', 'flauta de pico', 'flauta dulce', 'flauta recta', 'grabadora'], 'Abbreviation': ['rec']} 

{'m21ClassName': 'ReedOrgan', 'English': ['reed organ'], 'French': ['roseau organe'], 'German': ['harmonium'], 'Italian': ["canna d'organo", 'canna d'organo'], 'Russian': [], 'Spanish': ['caña de órganos'], 'Abbreviation': []} 

{'m21ClassName': 'RideCymbals', 'English': [], 'French': [], 'German': [], 'Italian': [], 'Russian': [], 'Spanish': [], 'Abbreviation': []} 

{'m21ClassName': 'Sampler', 'English': [], 'French': [], 'German': [], 'Italian': [], 'Russian': [], 'Spanish': [], 'Abbreviation': []} 

{'m21ClassName': 'SandpaperBlocks', 'English': ['sandpaper blocks'], 'French': ['blocs de papier de verre', 'papier de verre'], 'German': ['sandblöcke', 'sandpapier', 'sandpapier blöcke'], 'Italian': ['blocchi di carta vetrata', 'carta vetrata', 'ceppi di carta vetro'], 'Russian': [], 'Spanish': ['bloques de papel de lija', 'papel de lija'], 'Abbreviation': ['sand bl']} 

{'m21ClassName': 'Saxophone', 'English': ['saxophone'], 'French': [], 'German': ['saxophon'], 'Italian': ['sassofono', 'sax', 'saxofono'], 'Russian': ['saksofon'], 'Spanish': ['saxofón', 'saxofóno', 'saxófono'], 'Abbreviation': ['sax']} 

{'m21ClassName': 'Shakuhachi', 'English': ['shakuhachi'], 'French': [], 'German': [], 'Italian': [], 'Russian': [], 'Spanish': [], 'Abbreviation': ['shk fl']} 

{'m21ClassName': 'Shamisen', 'English': ['shamisen'], 'French': [], 'German': [], 'Italian': [], 'Russian': [], 'Spanish': [], 'Abbreviation': []} 

{'m21ClassName': 'Shehnai', 'English': ['shehnai'], 'French': [], 'German': [], 'Italian': [], 'Russian': [], 'Spanish': [], 'Abbreviation': ['shn']} 

{'m21ClassName': 'Siren', 'English': ['siren'], 'French': ['sirène'], 'German': ['sirene'], 'Italian': ['sirena', 'sirena a mano'], 'Russian': [], 'Spanish': ['sirena'], 'Abbreviation': []} 

{'m21ClassName': 'Sitar', 'English': ['sitar'], 'French': [], 'German': [], 'Italian': [], 'Russian': [], 'Spanish': [], 'Abbreviation': ['sit']} 

{'m21ClassName': 'SizzleCymbal', 'English': ['sizzle cymbal'], 'French': ['cymbale sur tiges', 'grésillement cymbale'], 'German': ['nietenbecken'], 'Italian': ['piatto chiodati', 'sfrigolio piatto'], 'Russian': [], 'Spanish': ['chisporroteo de platillos', 'platillo sizzle'], 'Abbreviation': []} 

{'m21ClassName': 'SleighBells', 'English': ['jingle bells', 'sleigh bells'], 'French': ['grelots'], 'German': ['pferdeschlittenglocken', 'rollschellen', 'schellen'], 'Italian': ['sonagli', 'sonagliera'], 'Russian': [], 'Spanish': ['cascabels'], 'Abbreviation': []} 

{'m21ClassName': 'SnareDrum', 'English': ['snare drum'], 'French': ['caisse claire', 'tambour'], 'German': ['kleine trommel', 'leinentrommel', 'marschtrommel', 'schnarrtrommel', 'snare-drum'], 'Italian': ['cassa chiara', 'rullante', 'tamburo militare'], 'Russian': ['frantsuzskii baraban'], 'Spanish': ['caja clara', 'con tensores', 'redoblante', 'tambor afinable', 'tambor militar pequeño'], 'Abbreviation': ['c c', 'sn dr']} 

{'m21ClassName': 'Soprano', 'English': ['soprano', 'sopranos'], 'French': ['bas-dessus'], 'German': ['sopran'], 'Italian': [], 'Russian': [], 'Spanish': [], 'Abbreviation': ['s']} 

{'m21ClassName': 'SopranoSaxophone', 'English': ['soprano saxophone'], 'French': ['saxophone soprano'], 'German': ['sopran-saxophon', 'sopransaxophon'], 'Italian': ['sassofono soprano', 'saxofono soprano'], 'Russian': [], 'Spanish': ['saxo soprano', 'saxofóno soprano'], 'Abbreviation': ['s sax']} 

{'m21ClassName': 'SplashCymbals', 'English': [], 'French': [], 'German': [], 'Italian': [], 'Russian': [], 'Spanish': [], 'Abbreviation': []} 

{'m21ClassName': 'SteelDrum', 'English': ['pan', 'steel drums', 'steel drum', 'steel pan'], 'French': ["tambour d'acier", 'tambour en acier'], 'German': ['stahltrommel', 'steeldrum'], 'Italian': ['cestello in acciaio', "tamburo d'acciaio"], 'Russian': [], 'Spanish': ['tambor de acero', 'tambor metálico de trinidad y tobago'], 'Abbreviation': ['st dr']} 

{'m21ClassName': 'StringInstrument', 'English': [], 'French': [], 'German': [], 'Italian': [], 'Russian': [], 'Spanish': [], 'Abbreviation': []} 

{'m21ClassName': 'SuspendedCymbal', 'English': ['suspended cymbal'], 'French': ['cymbale suspendue'], 'German': ['becken freihängend', 'hängebecken', 'hängendes becken', 'türkisches hängebecken'], 'Italian': ['piatto sospeso'], 'Russian': [], 'Spanish': ['platillo suspendido', 'platillos suspendidos'], 'Abbreviation': []} 

{'m21ClassName': 'Taiko', 'English': ['taiko'], 'French': [], 'German': [], 'Italian': [], 'Russian': [], 'Spanish': [], 'Abbreviation': []} 

{'m21ClassName': 'TamTam', 'English': ['bullseye gong', 'chau gong', 'tam-tam'], 'French': [], 'German': ['tamtam'], 'Italian': [], 'Russian': [], 'Spanish': [], 'Abbreviation': []} 

{'m21ClassName': 'Tambourine', 'English': ['marine', 'tambourine'], 'French': ['tambour de basque'], 'German': ['schellentrommel', 'tambourin', 'tamburin'], 'Italian': ['tamburello', 'tamburino', 'tamburo basco'], 'Russian': [], 'Spanish': ['pandereta', 'tambor de mano'], 'Abbreviation': ['tamb', 'tmbn']} 

{'m21ClassName': 'TempleBlock', 'English': ['temple block'], 'French': ['temple bloc'], 'German': ['tempel-block'], 'Italian': ['tempio di blocco'], 'Russian': [], 'Spanish': ['templo de bloque'], 'Abbreviation': ['temp bl']} 

{'m21ClassName': 'Tenor', 'English': ['tenor'], 'French': ['taille', 'ténor'], 'German': [], 'Italian': ['tenore'], 'Russian': [], 'Spanish': [], 'Abbreviation': ['t']} 

{'m21ClassName': 'TenorDrum', 'English': ['tenor drum'], 'French': ['caisse roulante', 'tambourin', 'ténor tambour'], 'German': ['rührtrommel', 'tenortrommel', 'wirbeltrommel'], 'Italian': ['cassa rullante', 'tamburo rullante'], 'Russian': ['tsilindricheskii baraban'], 'Spanish': ['caja redoblante', 'caja rodante', 'el tenor del tambor', 'tambor mayor'], 'Abbreviation': ['ten dr']} 

{'m21ClassName': 'TenorSaxophone', 'English': ['tenor saxophone'], 'French': ['saxophone ténor'], 'German': ['tenor-saxophon', 'tenorsaxophon'], 'Italian': ['sassofono tenore', 'saxofono tenore'], 'Russian': [], 'Spanish': ['saxo tenor', 'saxofóno tenor'], 'Abbreviation': ['t sax']} 

{'m21ClassName': 'Timbales', 'English': ['pailas criollas', 'timbales'], 'French': ['timbales créoles', 'timbales cubaines', 'timbales latines'], 'German': ['kuba-pauken'], 'Italian': ['timbales latinoamericani', 'timpanetti'], 'Russian': [], 'Spanish': ['pailas criollas'], 'Abbreviation': ['tim']} 

{'m21ClassName': 'Timpani', 'English': ['kettle drums', 'timpani'], 'French': ['timbale', 'timbales'], 'German': ['kesselpauke', 'kesseltrommel', 'pauke', 'pauken'], 'Italian': ['timballi', 'timballo', 'timpano', 'tympani'], 'Russian': ['litavra'], 'Spanish': ['atabal', 'timbal', 'timbales', 'timbals', 'tímpanos'], 'Abbreviation': ['k dr', 'pk', 'timp']} 

{'m21ClassName': 'TomTom', 'English': ['tom-tom'], 'French': ['tom'], 'German': ['tom', 'tom tom'], 'Italian': ['tamtam'], 'Russian': [], 'Spanish': ['tomtom'], 'Abbreviation': []} 

{'m21ClassName': 'Triangle', 'English': ['triangle'], 'French': [], 'German': ['dreieck', 'triangel'], 'Italian': ['triangolo'], 'Russian': [], 'Spanish': ['triángulo'], 'Abbreviation': ['trgl', 'tri']} 

{'m21ClassName': 'Trombone', 'English': ['trombone'], 'French': ['trombone'], 'German': ['posaune'], 'Italian': ['trombone'], 'Russian': ['trombon'], 'Spanish': ['trombón'], 'Abbreviation': ['tbni', 'trb']} 

{'m21ClassName': 'Trumpet', 'English': ['trumpet'], 'French': ['trompette'], 'German': ['trompete'], 'Italian': ['clarino', 'tromba'], 'Russian': ['truba'], 'Spanish': ['trompeta'], 'Abbreviation': ['tbe', 'tpt', 'tr']} 

{'m21ClassName': 'Tuba', 'English': ['tuba'], 'French': [], 'German': [], 'Italian': [], 'Russian': [], 'Spanish': [], 'Abbreviation': ['tb', 'tba']} 

{'m21ClassName': 'TubularBells', 'English': ['tubular bells'], 'French': ['cloches', 'cloches tubolaires', 'cloches tubulaires'], 'German': ['glocken', 'rohrenglocke', 'röhrenglocken'], 'Italian': ['campane', 'campane tubolari', 'campane tubulari'], 'Russian': [], 'Spanish': ['campanas', 'campanas tubulares'], 'Abbreviation': []} 

{'m21ClassName': 'Ukulele', 'English': ['ukelele', 'ukulele'], 'French': ['ukulélé'], 'German': [], 'Italian': [], 'Russian': [], 'Spanish': ['ukelele'], 'Abbreviation': ['uke']} 

{'m21ClassName': 'UnpitchedPercussion', 'English': [], 'French': [], 'German': [], 'Italian': [], 'Russian': [], 'Spanish': [], 'Abbreviation': []} 

{'m21ClassName': 'Vibraphone', 'English': ['vibraphone'], 'French': [], 'German': [], 'Italian': [], 'Russian': [], 'Spanish': [], 'Abbreviation': ['vib', 'vibr', 'vibes']} 

{'m21ClassName': 'Vibraslap', 'English': [], 'French': [], 'German': [], 'Italian': [], 'Russian': [], 'Spanish': [], 'Abbreviation': []} 

{'m21ClassName': 'Viola', 'English': ['viola'], 'French': ['alto'], 'German': ['altgeige', 'bratsche', 'viole'], 'Italian': [], 'Russian': [], 'Spanish': [], 'Abbreviation': ['br', 'va', 'vla']} 

{'m21ClassName': 'Violin', 'English': ['violin'], 'French': ['violon'], 'German': ['geige', 'violine'], 'Italian': ['violino'], 'Russian': ['skripka'], 'Spanish': ['violín'], 'Abbreviation': ['vio', 'vl', 'vln', 'vlon', 'vn', 'vni']} 

{'m21ClassName': 'Violoncello', 'English': ['cello', 'violoncello'], 'French': ['violoncelle'], 'German': ['cello', 'violoncell'], 'Italian': ['cello'], 'Russian': ["violonchel'"], 'Spanish': ['cello', 'chelo', 'violoncelo', 'violonchelo'], 'Abbreviation': ['vc', 'vcelle', 'vcl', 'vlc']} 

{'m21ClassName': 'Vocalist', 'English': ['voice'], 'French': ['chant', 'chanteur', 'chanteuse', 'voix'], 'German': ['singstimme', 'stimme'], 'Italian': ['canto', 'voca', 'voce'], 'Russian': ['golos'], 'Spanish': ['voz'], 'Abbreviation': ['v', 'voc']} 

{'m21ClassName': 'Whip', 'English': ['slapstick', 'whip'], 'French': ['fouet'], 'German': ['holzklapper', 'peitsche'], 'Italian': ['frusta'], 'Russian': [], 'Spanish': ['látigo'], 'Abbreviation': []} 

{'m21ClassName': 'Whistle', 'English': ['whistle'], 'French': ['siffler'], 'German': ['pfeifen'], 'Italian': ['fischio'], 'Russian': [], 'Spanish': ['silbar'], 'Abbreviation': ['whs']} 

{'m21ClassName': 'WindMachine', 'English': ['wind machine'], 'French': ['eoliphone', 'machine à vent', 'éoliphone'], 'German': ['aeolophon', 'windmaschine'], 'Italian': ['eolifono', 'macchina del vento', 'machina a venti'], 'Russian': [], 'Spanish': ['el viento de la máquina', 'máquina de viento'], 'Abbreviation': ['windmachine']} 

{'m21ClassName': 'Woodblock', 'English': ['woodblock'], 'French': ['bloc de bois', 'wood-bloc'], 'German': ['holzblock', 'holzschnitt'], 'Italian': ['blocco di legno', 'blocco di legno cinese', 'cassetina', 'xilografia'], 'Russian': [], 'Spanish': ['bloques de madera', 'caja china'], 'Abbreviation': ['wd bl']} 

{'m21ClassName': 'WoodwindInstrument', 'English': [], 'French': [], 'German': [], 'Italian': [], 'Russian': [], 'Spanish': [], 'Abbreviation': []} 

{'m21ClassName': 'Xylophone', 'English': ['xylophone'], 'French': ['claquebois', 'harmonica de bois', 'Èchelettes'], 'German': ['strohfiedel', 'xylophon'], 'Italian': ['gigelira', 'silofono', 'xilifono', 'xilofono'], 'Russian': ['ksilofon'], 'Spanish': ['xilofón', 'xilofóno', 'xilófono'], 'Abbreviation': ['xil', 'xyl']} 

]
mscuthbert commented 2 years ago

This seems better, but should the class name be the key name for the dict and the value be the dict of lists?

But I'm trying to figure out what the problem is that this solves? Imagine in the current system we wanted to add Chinese names (we should have some non-European languages). That would involve making changes in one place in the file. With the new system every line would then need to have a new 'Chinese': ['name'] entry and would make reviewing very difficult.

Also imagine that we want to have different en-us and en-gb entries. Currently we'd do something like: britishEnglish = [*americanEnglish, ('timpani', 'kettledrums')] In this system we'd have to repeat all the names again and again, leading to bugs.

MarkGotham commented 2 years ago

Thanks @mscuthbert.

Fair enough, though it depends on what edits we anticipate making (most often):

Other questions:

In the meantime, here (#1211) is a separate PR for a more instrument names that we'll want whether we stick with the current layout or re-structure.

jacobtylerwalls commented 2 years ago

transliteration (as prev.).

Yeah, I think in Python3 unicode is supposed to Just Work, so not an issue to worry about AFAIK.

I'm sort of worried that trying to making this frictionless to contribute to is in the eye of the beholder, so I don't immediately see a reason to change it. I hear @MarkGotham saying he could more easily spot missing names, but with some Python wrangling that's not so hard to spit out a report.

Also imagine that we want to have different en-us and en-gb entries. Currently we'd do something like: britishEnglish = [*americanEnglish, ('timpani', 'kettledrums')] In this system we'd have to repeat all the names again and again, leading to bugs.

+1 to that

shall we at least offer a function to help review? I have some code for this I'm happy to contribute.

In terms of a test?

if we're sticking with the structure as it is

The one thing I don't like about the current structure is that it doesn't use class name literals, which would highlight, autocomplete, and otherwise prevent bugs. I would be in favor of replacing the "...ToBestName" values with actual classes.


Tangent: it also occurred to me we could add read-only properties such as .french like the ones in the pitch module.

MarkGotham commented 2 years ago

Great, thanks @jacobtylerwalls

Re-structuring

Other ways to review

Class Names

Transliteration

Other

MarkGotham commented 2 years ago
# Dict-reversing functions building up to getAllNamesForInstrument.

def _getKeys(thisDict, thisValue):
    '''
    Retrieve key(s) for a given value (thisValue) and dict (thisDict).
    Returns all relevant keys as a list of strings (empty if no matches).
    '''
    returns = []
    for key, value in thisDict.items():
        if thisValue == value:
            returns.append(key)
    return returns

def bestNameFromClassName(className: str):
    '''
    Retrieves the 'best name' for a valid 
    :class:`~music21.instrument.Instrument` class name (className, str).

    There should be exactly one 'best name' per valid class name.
    This function raises an error if that is not the case.
    '''
    k = _getKeys(instrumentLookup.bestNameToInstrumentClass, className)
    if len(k) == 0:
        raise ValueError(f"No 'bestName' found. Please check the className {className}.")
    elif len(k) > 1:
        raise ValueError("More than one 'bestName' for this class.")
    else:  # len(k) == 1:
        return k[0]

_currentlySupportedLaguages = ['English', 'French', 'German', 'Italian', 'Russian', 'Spanish',
                              'Abbreviation',]

def getAllNamesForInstrument(className: str,
                             languages: list = _currentlySupportedLaguages):
    '''
    Retrieves all currently stored names for a given instrument.
    The instrumentName should be a str of the music21 className,
    and the languages to test should be a list (sic) of any or all of:
    the currently supported laguages:
    'English',
    'French',
    'German',
    'Italian',
    'Russian',
    'Spanish', and
    'Abbreviation' (an honourary 'language' for present purposes).

    All of those 'lanuages' are included by default.

    Returns a dict with keys for the language tested and values as a list of
    strings for any name variants in that language.

    If there are no variant names in the tested lanagues then 
    the returned dict is populated with empty lists.

    Raises errors for problems with the className and / or bestName.
    See :func:`~music21.instrument.bestNameFromClassName` for details.

    >>> instrument.getAllNamesForInstrument('Flute')
    {'English': ['flute', 'flutes', 'transverse flute'], 
    'French': ['flûte', 'flûte traversière', 'grande flûte'], 
    'German': ['flöte', 'querflöte'], 
    'Italian': ['flauto', 'flauto traverso'], 
    'Russian': ['fleita'], 
    'Spanish': ['flauta', 'flauta de boehm', 'flauta de concierto', 'flauta traversa', 'flauta travesera', 'flautas'], 
    'Abbreviation': ['fl']}

    >>> instrument.getAllNamesForInstrument('Flute', languages=['German'])
    {'German': ['flöte', 'querflöte']}

    '''

    instrumentNameDict = {}

    bestName = bestNameFromClassName(className)  # raises erros if issue

    for l in languages:

        if l not in _currentlySupportedLaguages:
            raise ValueError(f'Language {l} not currently supported.')

        sourceDict = getattr(instrumentLookup, l.lower() + 'ToBestName')
        keyList = _getKeys(sourceDict, bestName)
        instrumentNameDict[l] = keyList

    return instrumentNameDict
MarkGotham commented 2 years ago

Update / Summary:

Almost everything discussed here is:

The exceptions are a couple of additional look up options to consider:

  1. instrument.fromString search on not all, but specified languages only:
    • Default to checking all language lists, but
    • Allow users to specify one or or more from this list where the target language (including abbreviations) is known.
    • Example edit:
      • _currentlySupportedLanguages = ('English', 'French', 'German', 'Italian', 'Russian', 'Spanish', 'Abbreviation')
      • def fromString(instrumentString: str, languages: tuple = _currentlySupportedLanguages):
    • (then handle this change within fromString)
  2. Search in the other direction:
    • given an instrument (class).
    • retrieve all names for it (or, again optionally, for only one of the supported languages)
    • This code is drafted in the comment above (needs only a little update to reflect 1215, e.g. again removing best).

I'm happy to finish implementing these if they're welcome additions.

MarkGotham commented 2 years ago

Thanks for the quick turnaround @jacobtylerwalls! no.1 ^ now implemented at #1244.

MarkGotham commented 2 years ago

no.2 ^ now implemented at #1255 (getAllNamesForInstrument)