Open lea-xtend-ai opened 9 months ago
i change sample to 100,000 it was even worse
{'augmentation_batch_size': 16,
'augmentation_rounds': 1,
'background_paths': ['./audioset_16k', './fma'],
'background_paths_duplication_rate': [1],
'batch_n_per_class': {'ACAV100M_sample': 1024,
'adversarial_negative': 50,
'positive': 50},
'custom_negative_phrases': [],
'false_positive_validation_data_path': 'validation_set_features.npy',
'feature_data_files': {'ACAV100M_sample': 'openwakeword_features_ACAV100M_2000_hrs_16bit.npy'},
'layer_size': 32,
'max_negative_weight': 1500,
'model_name': 'alice',
'model_type': 'dnn',
'n_samples': 100000,
'n_samples_val': 2000,
'output_dir': './alice',
'piper_sample_generator_path': './piper-sample-generator',
'rir_paths': ['./mit_rirs'],
'steps': 50000,
'target_accuracy': 0.7,
'target_false_positives_per_hour': 0.2,
'target_phrase': ['alice'],
'target_recall': 0.5,
'tts_batch_size': 50}
Those numbers seem reasonable based on my past experience, so it's odd that the model isn't working well for you in practice. Can you share the trained model file, and/or the notebook you used to train the model that performs well?
As for the other questions:
1) The automatic model training notebook tries to use reasonable defaults and automation to set hyperparameters to simplify the training process. From my somewhat limited testing this works well most of the time, but it's not surprising to see that in some cases using a more manual process can produce a better model.
2) In general, more negative sample is better, but there are diminishing returns. In my own testing I have terabytes of negative data for different experiments, but I doubt all of it is needed. As for positive examples, usually between 20,000 and 50,000 is sufficient, but sometimes more can help (it depends on the model, from what I've seen).
I utilized the training_models.ipynb notebook from the repository and made modifications to filter positive clips. Here's the specific change I made:
positive_clips, durations = openwakeword.data.filter_audio_paths(
[
"pos_data/alice/VITS/",
],
min_length_secs = 0.5, # minimum clip length in seconds
max_length_secs = 2.9, # maximum clip length in seconds
duration_method = "header" # use the file header to calculate duration
)
print(f"{len(positive_clips)} positive clips after filtering, representing ~{sum(durations)//3600} hours")
The output shows that there are 48796 positive clips after filtering, representing approximately 8.0 hours.
This is my second experiment with 50,000 generated examples. The first experiment, which involved 10,000 examples from VITS and WAVEGLOW, worked fine.
Unfortunately, I cannot attach the model file due to GitHub's limitations on file types. Is it possible for me to send it via email instead?
Thank you for your assistance!
@lea-xtend-ai Did you use the training_models.ipynb exactly as it is? even the same model as mentioned there? also what do you mean by not downloading the full negative data?
I utilized the training_models.ipynb notebook from the repository and made modifications to filter positive clips. Here's the specific change I made:
positive_clips, durations = openwakeword.data.filter_audio_paths( [ "pos_data/alice/VITS/", ], min_length_secs = 0.5, # minimum clip length in seconds max_length_secs = 2.9, # maximum clip length in seconds duration_method = "header" # use the file header to calculate duration ) print(f"{len(positive_clips)} positive clips after filtering, representing ~{sum(durations)//3600} hours")
The output shows that there are 48796 positive clips after filtering, representing approximately 8.0 hours.
This is my second experiment with 50,000 generated examples. The first experiment, which involved 10,000 examples from VITS and WAVEGLOW, worked fine.
Unfortunately, I cannot attach the model file due to GitHub's limitations on file types. Is it possible for me to send it via email instead?
Thank you for your assistance!
@lea-xtend-ai can you put the model file into an archive (e.g., zip or tar)? That might help you attach it to this issue. See more details here: https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/attaching-files
I have a model that I trained using the training_models notebook, and it's working more or less fine. However, I've noticed a few false positives, especially with words containing the "al" combination where the "s" sound is less crucial for recognizing "Alice."
The model file is named alice_v5.zip
Interestingly, I observed that the model created by automatic_model_training.ipynb is much lighter, weighing only 206 KB, whereas the one from training_models is 351 KB.
thank you again!
I've also observed sub-usual performance on some words: in my latest case 'apartment'. I can upload an example if that would help as well.
@lea-xtend-ai testing the alice_v1.onnx
model myself, I broadly agree with you. It performs reasonably well, but does have some false positives, especially for words that are very similar to "alice" (e.g., "malice", "chalice", "callus", etc.).
This makes sense to a certain extent, as the training_models
notebook does not included adversarial speech, while the automatic model training notebook does. That is, the process attempts to find words that sound similar to the target wakeword, and includes those in the training data. However, because this process is automated it doesn't always work as expected.
If you want to explore the automatic training process further, there is an an option in the YAML config file here that allows you to specify specific adversarial negative phrases. This can greatly improve performance in cases where you know that certain words/phrases lead to false activations.
I've also observed sub-usual performance on some words: in my latest case 'apartment'. I can upload an example if that would help as well.
@twitchyliquid64 if you are noticing too many false-positives with the "apartment" wakeword, I would recommend the same approaches mentioned above.
@dscripka Yes, I agree with you, but I haven't been able to get the "automatic model training notebook" to work properly, regardless of the parameters I choose.
Hi,
I tried to use automatic_model_training.ipynb but encountered significant issues, resulting in a model that does not work effectively at all.
Configuration YAML Used
Output Results:
I tested using the detect_from_microphone script.
On the other hand, I successfully trained using training_models.ipynb with synthetic_speech_dataset_generation for generating 10,000 samples (5000 for each model), and the results were fine without downloading the full negative data.
Questions
training_models.ipynb
: 2.1. Do I need to download the entire negative dataset, and if so, what is the total size in GB? 2.2. Is generating 50,000 positive samples for each model necessary?Thank you!