This PR makes a number of changes to helpers/mei_processing/mei_parser.py and helpers/mei_processing/mei_tokenizer.py and their associated type and test files. These changes include:
refactoring MEITokenizer so that we no longer return two different types of ngram documents (one on the neume level and one on the neume component level) but a single type of ngram document. This single type always includes pitch (and therefore contour and interval) information and will also include neume names if the ngram coincides with a set of complete neumes. This refactoring ensures that we can: 1. return pitch information when a neume name is queried; and 2. we don't have multiple ngrams (one containing pitch information and one containing neume names) for the same set of pitches.
removing empty syllables and neume from an MEI file during parsing. It seems that previous versions of MEI encoding during the OMR process could create these empty object. While this issue has been fixed, we will, at least for a little while, encounter files from before the fix.
modifying the dictionaries created by MEITokenizer to include fields required for indexing (id and type) and fields that we want to be easily available in the documents returned by Solr (manuscript and folio)
Additional refactoring includes:
changing neume_type to neume_name (and NeumeType to NeumeName, etc.)
adding a neume's system to the neume component objects it contains
add a few additional development dependencies for typing and linting
This PR makes a number of changes to
helpers/mei_processing/mei_parser.py
andhelpers/mei_processing/mei_tokenizer.py
and their associated type and test files. These changes include:MEITokenizer
so that we no longer return two different types of ngram documents (one on the neume level and one on the neume component level) but a single type of ngram document. This single type always includes pitch (and therefore contour and interval) information and will also include neume names if the ngram coincides with a set of complete neumes. This refactoring ensures that we can: 1. return pitch information when a neume name is queried; and 2. we don't have multiple ngrams (one containing pitch information and one containing neume names) for the same set of pitches.MEITokenizer
to include fields required for indexing (id
andtype
) and fields that we want to be easily available in the documents returned by Solr (manuscript
andfolio
)Additional refactoring includes:
neume_type
toneume_name
(andNeumeType
toNeumeName
, etc.)