Closed lucasmpaim closed 2 years ago
This is very cool! I've been meaning to port this over to python for a while now, so I'm really happy you've gotten this started!
However, before I link to it, I would want to have some tests running to ensure that the outputs of the two programs are always identical. Ideally, this would be a full evaluation for a few different (long) pieces, as well as some unit tests of some of the main functions, comparing the python output to the java output. I'm not sure what's the best way to do this at the moment, aside from just writing a python script and a java script that each print verbose output at each step, and then compare their output.
I would also hope to add to your readme a note that the canonical version is (at least for now) the java code (ie, it is best to check the final results with the java code, at least until the above tests are working).
For example, I was looking through your DTW code and noticed this if/elif/else
block, which should actually be three separate if
statements (so that potentially multiple of these will be executed). See the original code here: https://github.com/apmcleod/MV2H/blob/master/src/mv2h/tools/Aligner.java#L142
But overall, I'm very excited about this! Let me know if you'd like me to help you work on this at all.
Hi, will be great to have some help to do this tests, I think we can do to do some output tests for now, for example, add some midi files to repository and execute the command line of Java version and compare both output values.
I have executed it with some pieces from my work, but my dataset have some bias from the instrument and the style of music. I think will be great if you can share more pieces to we write this unit tests.
You have an good point on DTW, my mistake, I will fix that, thanks 👍
I've made some example inputs/outputs here for you to test: test-examples.zip
The original scores come from the MusicXML scores from ASAP, and I manually edited some errors into them.
I converted the scores to MIDI using musescore as described in the MV2H readme. Then, I used the java converter to convert them to the text format. Both the MIDI and the txt files are included in the zip, so you can test the MIDI -> txt conversion that way.
I also ran a few different specific tests:
output.g-846edited.t-italian-orig.p-0.5.a.v
: Here, I ran with ground truth 846p-edited
and transcription italian-orig
, with a penalty of 0.5, and verbose printing (the verbose printing is a new feature, for these tests). Here, you can see that it finds 440 different alignments, and prints the evaluation scores for each of them.output.g-italian-orig.t-italian-edited.p-0.5.A.v
: Here, I ran with ground truth italian-orig
and transcription italian-edited
, with a penalty of 0.5, and verbose printing, as well as the -A
flag to print the alignment (in case you want it for further debugging--the -A
flag was a requested feature from another user). Here, it finds only 2 alignments, and prints the alignment and evaluation for each.More tests are in output_results
:
First, I ran 2 specific tests, to check how many alignments it finds for each (it finds the alignments in <0.5 seconds, but I killed the processes before it evaluated all of them, which would have taken a while):
Command: -g test-examples/Bach-846p-edited.mid.txt -t test-examples/Bach-italian-concerto-orig.mid.txt -a -p 1.5
Number of alignments: 1806552
Command: -g test-examples/Bach-846p-orig.mid.txt -t test-examples/Bach-846p-edited.mid.txt -a -p 0.0 Number of alignments: 3847193342
I also ran tests on all of the pairs of alignments using a bash command found in the output_results
file. Each paragraph prints the ground truth filename followed by the transcription filename, and then the system output. If it found too many alignments, I killed it (^C
). This all ran in about 2-3 seconds.
Please let me know if something doesn't make sense!
These should be good for now, but the process is a bit messy. In the future, it would be good to have better tests, both with actual transcriptions, and some unit testing.
And here are some actual transcription examples (thanks to Lele Liu: @cheriell) and their outputs with and without -a. Also converted from xml to midi using musescore3.
Hey, I'm finishing of writing the unit test's to compare the Java Version with the Python, but one of the tasks:
-g
$TRANSCRIPTION_DIR/Bach-846p-orig.mid.txt
-t
$TRANSCRIPTION_DIR/Bach-italian-concerto-edited.mid.txt
-a
When I use this with the -a
flag, I got a real big number of align checking's, in the Java version initially I got only 44 possible alignments, Initially I found some problems in my side, but one thing that's started to integrate me, is why the Java Version is restarting the AlignmentNode.count
after some time, investigating a little bit more, I found the problem, the cause is the Integer.MAX_VALUE
of the Java, when I change the count
property to a BigInteger
instead of int
I got the same number from the python version (and it's take a lot of time to run).
Something like:
Evaluating alignment 1 / 491487296830301069939877099594644818082149171200000000000000000000
The total calculated after by Aligner.getPossibleAlignments
is really big... probably is something wrong with DTW algorithm, and I replicated the same bug on Python ( lol ).
I'm removing this case from tests for now, but it's really necessary investigate why this is happen.
(For replicate that in Java, you need to change to BigInteger, long not support that value too, if you want I can provide my changes in one pull request).
I think the cause is the probably trying to align two different music's (I don't know)
I have added test's to the Python Version to compare the Java Version output to the Python Version, you can see it here.
Basically it run's the parameters described here, execute the same parameters on python version and compare it with the python output values.
Python version code coverage:
Name Stmts Miss Cover Missing
-------------------------------------------------------------------
pyMV2H/__init__.py 1 0 100%
pyMV2H/__main__.py 2 2 0% 1-2
pyMV2H/cli.py 13 6 54% 25-32, 35, 44
pyMV2H/commands/__init__.py 2 0 100%
pyMV2H/commands/base.py 7 1 86% 13
pyMV2H/commands/compare_files.py 17 2 88% 18, 24
pyMV2H/commands/midi_converter.py 8 5 38% 7-12
pyMV2H/converter/__init__.py 0 0 100%
pyMV2H/converter/base.py 45 45 0% 1-71
pyMV2H/converter/midi_converter.py 69 69 0% 1-98
pyMV2H/metrics/__init__.py 0 0 100%
pyMV2H/metrics/f1.py 9 2 78% 10-11
pyMV2H/metrics/harmony.py 42 11 74% 15, 49, 73-88
pyMV2H/metrics/meter.py 70 9 87% 48, 54-55, 71-77
pyMV2H/metrics/multi_pitch.py 8 0 100%
pyMV2H/metrics/mv2h.py 17 0 100%
pyMV2H/metrics/note_value.py 21 0 100%
pyMV2H/metrics/voice.py 61 0 100%
pyMV2H/utils/__init__.py 0 0 100%
pyMV2H/utils/algorithm_config.py 4 0 100%
pyMV2H/utils/align_files.py 105 1 99% 55
pyMV2H/utils/alignment_node.py 20 1 95% 23
pyMV2H/utils/comparators.py 29 8 72% 12, 23, 28-31, 43-46
pyMV2H/utils/convert_time.py 51 6 88% 53-54, 68, 75-86
pyMV2H/utils/matches.py 25 0 100%
pyMV2H/utils/midi_meta_tonic_map.py 1 1 0% 6
pyMV2H/utils/music.py 112 2 98% 90, 117
pyMV2H/utils/mv2h.py 63 26 59% 24, 61-76, 79-98
pyMV2H/utils/normalize_list_size.py 5 5 0% 3-14
pyMV2H/utils/pojos.py 9 0 100%
pyMV2H/utils/remove_duplicates.py 28 3 89% 13, 29, 45
pyMV2H/utils/voice.py 57 3 95% 26, 60, 77
-------------------------------------------------------------------
TOTAL 901 208 77%
Thank you for pointing out that bug! I've just fixed it in the most recent release, and confirm that the count you get is what the java code now gets as well.
But I don't think it's a bug with the DTW implementation, but rather due to the fact that there are just so many non-matching notes in cases like that (and the pieces are so long). In such cases, depending on the penalty, we might have something like 2^N
alignments, where N
is the total number of note sets between the two pieces.
The tests look really nice! Can you run also the most recent examples from Lele? These are also much more realistic examples, but I assume they will be fine since the previous ones were.
Also, can you tell me what commands you've run exactly for these tests? I'd like to play around with them a bit on my machine with some different outputs, etc.
This is looking really good, though! :D
Thanks, I will add the others cases provided on Lele samples later today, and add some tests to coverage the midi conversion too
Included the new test cases from Lele Liu (@cheriell).
Before you execute the tests, you need to install the pyMV2H on your enviroment (root, virtualenv, conda, etc)
To do this, you may need to run:
pip install -e .
This will install the package based on the current directory, after that you just need to run pytest
.
To see the code code-coverage report, after the tests run:
coverage report -m
The tests look very nice! The python program is quite slow though. For example, when I run the following java command (using Lele's files), it takes on 3 seconds:
java -cp ../MV2H/bin mv2h.Main -g tests/input_files
/transcription_files/reference.mid.txt -t tests/input_files/transcription_files/transcription_01.mid.txt -a
However, with the python, this is still very slow. I made a lot of improvements for Lele in the past couple of months, so maybe you can have a look at this PR for how I sped up the computation significantly.
You can profile the python code using this command:
python -m cProfile -s cumtime local.py compare_files -g tests/input_files/transcription_files/reference.mid.txt -t tests/input_files/transcription_files/transcription_01.mid.txt -a
In particular, it looks like the convert_time function is quite slow (I added caching for this function, I think). I also added caching in a couple of places when getting the F1 of two note_sets, I think.
Let you know if you need some help with this.
I have made some improvements here: https://github.com/lucasmpaim/pyMV2H/pull/1
The convert_time
has the same caching algorithm, the same number of method call's from Java version but the performance is really different.
The performance for this case keep really slow, with that PR decrease a lot, but is running in ~2min on my macbook pro m1, I think don't have more things to do in Python side to improve this, the code complexity (number of large loop's) is big, and the work for improve this on python way, (using list comprehension, or other's things) will get a lot of time.
For improve this, I think is better rewrite the align and DTW in C++ and integrate it with Cython to get a good performance like's Java version's.
I would be surprised if C++ would be necessary. Python is not really slower than java (especially by this much) for simple cases like this.
I was looking into your convert_time code and noticed 2 errors:
break
is missing here. (See the java code.)if
statement is incorrect (it will always be True
). (See the java code.)I suspect the first will speed up the code significantly.
I noticed a few other points as well (the get_distance function seems to be the slowest now...):
create_list_of_size
is very inefficient to call append
all the time. A much better way is: my_list = [0] * size
This is much faster: https://stackoverflow.com/questions/20816600/best-and-or-fastest-way-to-create-lists-in-python
In get_distance
you can change these lines:
for t_key in t_note_map.keys():
count = t_note_map[t_key]
to:
for t_key, count in t_note_map.items():
Finally, the lambda
expression at the bottom could just be: sum(p_note_map.values())
I hope these will help.
I have implemented an pure-python version of this metrics for a better integration with python frameworks (such: pytorc, tensorflow and others), I'm using it for my own research with AMT system.
I think will be great reference that repository on README, will help a lot the community that don't want to use something like
JPype
and write JVM code or make an wrapper of it.You can see it here
Thanks ;)