bmcfee / crema

convolutional and recurrent estimators for music analysis
BSD 2-Clause "Simplified" License
84 stars 22 forks source link

'index_train.json' file needed for training/chords/02-train.py #22

Closed VtlNmnk closed 2 years ago

VtlNmnk commented 5 years ago

Hi! I am trying to repeat step by step model training. I use Isophonics annotations and The Beatles albums. At step "02-train", the file 'index_train.json' is missing. I did not find a place where it was must be created earlier. What am I doing wrong? The previous steps were completed without errors, except that at 01-prepare all the progress bar was filled, it was printing an empty list 01-prepare empty list printing

bmcfee commented 5 years ago

I don't know. index_train.json is in the repository: https://github.com/bmcfee/crema/blob/master/training/chords/index_train.json

Note that this is just an index file (list of ids) and the actual data is outside the repo. It's not really intended to be trained outside of our local environment here, though it's certainly possible.

VtlNmnk commented 5 years ago

Thanks for the answer. I understand that audio files and annotations are needed. This file "index_train.json" assigns each audio file its own serial number and ID. It would be nice to add a few lines to the code to automatically generate this file in "01-prepare.py". Then anyone who wants to try your project will be able to train their own model

bmcfee commented 5 years ago

This file "index_train.json" assigns each audio file its own serial number and ID

I get where you're coming from, but I don't think this file can be automatically generated. It comes from the dataset itself, and in this specific case, depends on a pre-defined train-test split.

VtlNmnk commented 5 years ago

Thanks for the answer. Your project is the best I could find on the topic of automatic chord detection in a web. I am very grateful to you and I want to help in the development of your project. As you can understand from the screenshot, I use google colab in order to train my model. Now my model is training in google colab with their free GPU. The GPU is loaded on 2.5 GB of the available 12 GB. When I get the first trained model, I will share the jupyter notebook files with you. I ran into several problems, but I already solved them: 1 Parallel works well only with n_jobs = 1. I remake this code and now "index_train.json" file is done automatically at "01-prepare" 2 Working with files in google colab is done a little differently, not like on the local machine. example: a link to another folder woks well on the local computer, but with google colab it doesn’t work Selection_112 4 The most difficult to fix: the setup.py file now installs libraries that do not work together! keras> = 2.0 is specified in the setup.py file I automatically installed keras 2.2.4. With it I got the error "while_loop () got an unexpected keyword argument 'maximum_iterations'". On the stackoverflow I found a hint that I need to roll back to keras == 2.1.2. The model began to train, but at the end of the epoch could not save (

The original project could run in google colab without any changes, but with current setup.py and link file in crema/models it does not run

bmcfee commented 5 years ago

Thanks for the kind words, I'm glad you find this useful!

The scripts here were never really developed with colab in mind, but I'm glad you found a workaround that isn't too cumbersome.

For your last point, keras historically doesn't do a great job at signaling API-breaking changes or adhering to semantic versioning, so this isn't too surprising. The end-of-epoch save problem is one that I just ran into this week while retraining this model. It's caused by creating input layers with dtype=np.float32 (numpy dtype object) instead of dtype='float32' (string type desriptor).

That problem itself is relatively easy to fix, but complicated by the fact that in my dependency stack here, it's triggered by multiple independent packages (pumpp and crema). I've fixed it locally already by patching pumpp (see issue here: https://github.com/bmcfee/pumpp/issues/112), and that fix will be released in 0.5.1 (some time in the next week or two, probably). At that point, things should work smoothly again.

If you can't wait for that, feel free to pull down this branch of pumpp and use it instead of the 0.5.0 release.

VtlNmnk commented 5 years ago

Hi, Brian! With your help, the training process has begun. But I could not get accuracy above 66 percent. Is there any way to check if I downloaded the correct albums for annotations? Or is there any way to get audio files for annotations?

bmcfee commented 5 years ago

But I could not get accuracy above 66 percent.

That's about right for this model, if we're talking about chord tag accuracy. As noted in the paper (Section 4.4), the model gets around 64% tag accuracy. Chord models are usually not evaluated in that way though, since it's blind to differently spelled equivalent chords, hence the collection of mir_eval metrics.

VtlNmnk commented 5 years ago

Thanks for the quick response! Yes, I still have problems with "mir_eval." Perhaps again, different versions of the libraries do not want to work together. I get "Chord Intervals must not overlap" error. I also wanted to ask why val_chord_tag_sparse_categorical_accuracy is used for monitoring, and not val_chord_tag_loss. Usually, to avoid overfitting, val_loss is used for monitoring. Is this also related to the features of the data we work with?

bmcfee commented 5 years ago

Perhaps again, different versions of the libraries do not want to work together. I get "Chord Intervals must not overlap" error.

That's strange, everything should work. If it doesn't, then there's a bug.

I also wanted to ask why val_chord_tag_sparse_categorical_accuracy is used for monitoring, and not val_chord_tag_loss. Usually, to avoid overfitting, val_loss is used for monitoring. Is this also related to the features of the data we work with?

There's a lot to unpack here, but it's not specific to the data.

  1. accuracy vs loss: In general, I don't think it's correct to use loss for validation, rather than accuracy. The purpose of validation loss is to estimate the risk of your estimator, which depends on your error criteria and not your choice of objective function. The two are often correlated, but they are not the same. Typically loss is a (continuous & differentiable) upper bound on risk, eg cross-entropy instead of 0-1 loss.

  2. The model here is multi-task (several loss functions) and the training setup naively combines these by an unweighted sum. Probably you could do a little better with a clever weighting strategy, but it seems to work well enough as is. However, some of the tasks are much easier (pitch profile) than others (chord tag), so validating on the combination could be misleading. Since we had to pick one score to validate on (I didn't want to go into any pareto frontier rabbit holes), we picked the hardest loss (chord tag) for validation since good performance there should imply good performance on the other tasks.

VtlNmnk commented 5 years ago

That's strange, everything should work. If it doesn't, then there's a bug.

Evaluating test set: 0%| | 0/242 [00:00<?, ?it/s]/content/gdrive/My Drive/ACE/datasets/Isophonics/Carole King/Tapestry/01 I Feel The Earth Move.jams time duration value confidence 0 0.000000 2.879274 C:maj 0.378468 1 2.879274 1.578957 F:maj/5 0.633850 2 4.458231 2.321995 C:maj 0.351939 3 6.780227 1.671837 F:maj/5 0.638732 4 8.452063 7.894785 C:min7 0.505877 5 16.346848 3.993832 F:maj 0.508493 6 20.340680 2.229116 C:min 0.484936 7 22.569796 1.021678 F:maj/5 0.407978 8 23.591474 0.557279 A#:maj 0.413734 9 24.148753 1.950476 D#:maj 0.634642 10 26.099229 2.321995 G#:maj7 0.542116 11 28.421224 1.857596 F:min7 0.854465 12 30.278821 2.229116 A#:sus4 0.753771 13 32.507937 1.857596 D#:maj 0.479470 14 34.365533 2.229116 G#:maj7 0.824016 15 36.594649 1.764717 F:min7 0.530133 16 38.359365 1.207438 A#:sus4 0.403002 17 39.566803 1.021678 G:min7 0.420444 18 40.588481 8.080544 C:min7 0.527348 19 48.669025 3.808073 F:7 0.537184 20 52.477098 7.337506 C:min 0.487154 21 59.814603 0.835918 G:7 0.270856 22 60.650522 2.229116 C:min7 0.589079 23 62.879637 2.229116 F:maj/5 0.662177 24 65.108753 1.578957 C:min7 0.401378 25 66.687710 2.414875 F:maj 0.548481 26 69.102585 1.857596 C:min7 0.543202 27 70.960181 2.136236 F:maj/5 0.426052 28 73.096417 1.486077 C:min7 0.402477 29 74.582494 2.229116 F:maj 0.449018 30 76.811610 1.486077 C:min 0.405973 31 78.297687 2.693515 F:maj 0.520062 32 80.991202 1.207438 C:min 0.439451 33 82.198639 2.879274 F:maj 0.503541 34 85.077914 1.114558 C:min 0.367490 35 86.192472 2.786395 F:7 0.480543 36 88.978866 1.393197 C:min 0.581851 37 90.372063 1.021678 F:7 0.456585 38 91.393741 1.114558 A#:maj 0.404353 39 92.508299 1.857596 D#:maj7 0.573190 40 94.365896 1.950476 G#:maj7 0.414358 41 96.316372 2.229116 F:min7 0.826689 42 98.545488 2.136236 A#:sus4 0.596368 43 100.681723 1.764717 D#:maj7 0.492949 44 102.446440 2.229116 G#:maj7 0.508445 45 104.675556 2.136236 F:min7 0.822362 46 106.811791 1.857596 A#:maj 0.345832 47 108.669388 22.848435 C:min7 0.530575 48 131.517823 2.879274 C:min 0.469391 49 134.397098 2.600635 F:7 0.514644 50 136.997732 1.300317 C:min 0.426261 51 138.298050 6.222948 F:7 0.489271 52 144.520998 27.120907 C:min 0.519918 53 171.641905 4.551111 C:sus4 0.382383 54 176.193016 2.879274 N 0.792973 time duration value confidence 0 0.000 0.401 N 1.0 1 0.401 2.834 C:min7 1.0 2 3.235 0.528 F/5 1.0 3 3.763 0.264 C:min7 1.0 4 4.027 0.505 F/5 1.0 5 4.532 2.796 C:min7 1.0 6 7.328 0.462 F/5 1.0 7 7.790 0.279 C:min7 1.0 8 8.069 0.478 F/5 1.0 9 8.547 2.764 C:min7 1.0 10 11.311 0.512 F/5 1.0 11 11.823 0.254 C:min7 1.0 12 12.077 0.511 F/5 1.0 13 12.588 4.081 C:min7 1.0 14 16.669 4.089 F:7 1.0 15 20.758 1.951 C:min9 1.0 16 22.709 1.213 D:min7/7 1.0 17 23.922 0.517 Ab/2 1.0 18 24.439 1.478 Eb:maj7 1.0 19 25.917 0.263 F:min 1.0 20 26.180 0.254 G:min 1.0 21 26.434 2.313 Ab:maj7 1.0 22 28.747 1.831 F:min7 1.0 23 30.578 1.973 Ab/2 1.0 24 32.551 1.547 Eb:maj7 1.0 25 34.098 0.228 F:min 1.0 26 34.326 0.271 G:min 1.0 27 34.597 2.235 Ab:maj7 1.0 28 36.832 1.818 F:min7 1.0 29 38.650 1.267 Ab/2 1.0 .. ... ... ... ... 76 116.479 0.507 F/5 1.0 77 116.986 2.690 C:min7 1.0 78 119.676 1.246 F/5 1.0 79 120.922 2.731 C:min7 1.0 80 123.653 1.251 F/5 1.0 81 124.904 2.696 C:min7 1.0 82 127.600 1.282 F/5 1.0 83 128.882 2.725 C:min7 1.0 84 131.607 1.255 F/5 1.0 85 132.862 2.013 C:min7 1.0 86 134.875 1.970 F:7 1.0 87 136.845 1.894 C:min7 1.0 88 138.739 2.034 F:7 1.0 89 140.773 1.956 C:min7 1.0 90 142.729 1.978 F:7 1.0 91 144.707 2.742 C:min7 1.0 92 147.449 1.219 F/5 1.0 93 148.668 2.709 C:min7 1.0 94 151.377 1.259 F/5 1.0 95 152.636 2.721 C:min7 1.0 96 155.357 1.230 F/5 1.0 97 156.587 2.697 C:min7 1.0 98 159.284 1.247 F/5 1.0 99 160.531 1.922 C:min7 1.0 100 162.453 2.022 D:min7/b7 1.0 101 164.475 2.202 C:min7 1.0 102 166.677 2.264 Bb:7/2 1.0 103 168.941 2.668 Ab:maj7/3 1.0 104 171.609 4.889 Bb:maj7/2 1.0 105 176.498 2.569 N 1.0

[106 rows x 4 columns] /usr/local/lib/python3.6/dist-packages/mir_eval/chord.py:590: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use arr[tuple(seq)] instead of arr[seq]. In the future this will be interpreted as an array index, arr[np.array(seq)], which will result either in an error or a different result. abs_bitmap[idxs] = 1


ValueError Traceback (most recent call last)

in () ----> 1 evaluate(input_path) 6 frames in evaluate(input_path) 65 for ann_i, aud_i in loop: 66 ---> 67 results = track_eval(ann_i, aud_i) 68 df = pd.DataFrame.from_dict(dict(results), orient='index') 69 in track_eval(ann, aud) 38 print(est.to_dataframe()) 39 print(jam_ref.annotations['chord', 0].to_dataframe()) ---> 40 raise exc 41 42 return (track, dict(scores)) in track_eval(ann, aud) 32 try: 33 scores = jams.eval.chord(jam_ref.annotations['chord', 0], ---> 34 est) 35 print(scores) 36 except ValueError as exc: /usr/local/lib/python3.6/dist-packages/jams/eval.py in chord(ref, est, **kwargs) 196 197 return mir_eval.chord.evaluate(ref_interval, ref_value, --> 198 est_interval, est_value, **kwargs) 199 200 /usr/local/lib/python3.6/dist-packages/mir_eval/chord.py in evaluate(ref_intervals, ref_labels, est_intervals, est_labels, **kwargs) 1599 durations) 1600 scores['underseg'] = underseg(merged_ref_intervals, merged_est_intervals) -> 1601 scores['overseg'] = overseg(merged_ref_intervals, merged_est_intervals) 1602 scores['seg'] = min(scores['overseg'], scores['underseg']) 1603 /usr/local/lib/python3.6/dist-packages/mir_eval/chord.py in overseg(reference_intervals, estimated_intervals) 1423 """ 1424 return 1 - directional_hamming_distance(reference_intervals, -> 1425 estimated_intervals) 1426 1427 /usr/local/lib/python3.6/dist-packages/mir_eval/chord.py in directional_hamming_distance(reference_intervals, estimated_intervals) 1387 if len(reference_intervals) > 1 and (reference_intervals[:-1, 1] > 1388 reference_intervals[1:, 0]).any(): -> 1389 raise ValueError('Chord Intervals must not overlap') 1390 1391 est_ts = np.unique(estimated_intervals.flatten()) ValueError: Chord Intervals must not overlap
bmcfee commented 5 years ago

It might be a problem with the reference annotation, not the crema output. Does it fail if you evaluate the reference annotation against itself?

VtlNmnk commented 5 years ago

Does it look like the length of the audio file does not match the duration of the annotations?

179,07229 sec is duration of estimated annotations 179,067 sec is duration of original annotations

bmcfee commented 5 years ago

Ah, that would do it! It appears that the chord module does not fully validate intervals in the same way that, say, structure does.

My general advice is to trim estimates to the same duration as references, that way the resulting metrics for two different estimates on a track will be comparable. However, if your reference annotations have incorrect duration (as it seems here), you might want to correct those by hand.

Either way, could you raise an issue on mir_eval? The error message given here turns out to not be helpful (a problem) and it also highlights that the up-front validation needs to be stricter to make these things easier to diagnose.

VtlNmnk commented 5 years ago

image image

VtlNmnk commented 5 years ago

image image The error is different, but it also does not tell me anything. Either fit the annotations to the audio files, or I need to somehow get the same audio files as in the annotations.

VtlNmnk commented 5 years ago

image image Estimated annotations can be compared with themselves. But the original - no. I somehow did not correctly read this file with reference annotations?

bmcfee commented 5 years ago

I have a hard time reading screenshots, but it does appear that your reference annotations are a little broken (beyond duration disagreements) and need to be corrected.

VtlNmnk commented 5 years ago

I thank you for the answers. I understand that it’s difficult to understand something from the screenshot. I’ll figure out what's wrong with annotations myself.

I have only one question: is there somewhere audio files for annotations in the public domain, or at least are there fingerprints of the audio files? If I use the wrong audio files at all, then this distorts the result.

bmcfee commented 5 years ago

I’ll figure out what's wrong with annotations myself.

I've seen this in a few billboard / isophonics files, where the time intervals are either overlapping or leave a gap. Probably the easiest way to find these is to iterate over the reference annotations and try to evaluate ref-vs-ref to see which ones fail.

is there somewhere audio files for annotations in the public domain, or at least are there fingerprints of the audio files?

For chords, I don't think so. I know that DAn Ellis had a matlab tool for this kind of thing, but that's only useful with fingerprints. Maybe it's time to start building out new datasets on CC-licensed audio. :)

VtlNmnk commented 5 years ago

Hi! I can collect some new chord dataset. Is there any convenient tool for annotating audio files?

bmcfee commented 5 years ago

I think most people still use sonic-visualizer for this sort of thing, but I could be mistaken.

bmcfee commented 2 years ago

(thousands of years later)

I think there's nothing else to do on this thread? To summarize, it broadly landed up on problems with mir_eval outside the scope of crema. I'll close this out.