fastaudio / fastai_audio

[DEPRECATED] 🔊️ Audio with fastaiv1
MIT License
160 stars 49 forks source link

added __iter__ functions to AudioConfig and SpectrogramConfig; tested #64

Closed free-soellingeraj closed 4 years ago

free-soellingeraj commented 4 years ago

Added the __iter__ functions to the configs.

Please review and let me know if I need to go further. I was thinking that I could add a test that checked that the output from the dict(config) is correct for the default configuration.

The only thing that gave me pause was that it relies on keeping the SpectrogramConfig attribute in the AudioConfig named sg_cfg

Here's an example of what happens for me:

audio_config = {
    'remove_silence':"all",
    'use_spectro':True,
    'duration':4100,
    'downmix':True,
    'cache':True,
    'cache_dir':Path(data_folder/'cache')
}
spec_config = {
    'sr': 16000,
    'hop_length': 512,
    'power': 1.0,
    'n_mels': 128,
    'fmin': 0.0,
    'fmax': 8000,
    'ref': 1.0, 
    'amin': 1e-05,
    'top_db': 80.0,
}
spec_config['n_fft'] = 20 * spec_config['n_mels']

config = AudioConfig(
    remove_silence=audio_config['remove_silence'],
    use_spectro=audio_config['use_spectro'],
    duration=audio_config['duration'],
    downmix=audio_config['downmix'],
    cache=audio_config['cache']
)
config.cache_dir = Path(
    data_folder/'cache'
)
config.resample_to = 8000
config.sg_cfg = SpectrogramConfig(
    hop_length=spec_config['hop_length'],
    n_mels=spec_config['n_mels'],
    f_min=spec_config['fmin'],
    f_max=spec_config['fmax'],
    n_fft=20*spec_config['n_mels'],
    top_db=spec_config['top_db']
)

dict(config)

# Returns:
# {'cache': True,
#  'duration': 4100,
#  'max_to_pad': None,
#  'pad_mode': 'zeros',
#  'remove_silence': 'all',
#  'use_spectro': True,
#  'mfcc': False,
#  'delta': False,
#  'silence_padding': 200,
#  'silence_threshold': 20,
#  'segment_size': None,
#  'resample_to': 8000,
#  'standardize': False,
#  'downmix': True,
#  'sg_cfg': {'f_min': 0.0,
#   'f_max': 8000,
#   'hop_length': 512,
#   'n_fft': 2560,
#   'n_mels': 128,
#   'pad': 0,
#   'to_db_scale': True,
#   'top_db': 80.0,
#   'win_length': None,
#   'n_mfcc': 20},
#  'cache_dir': PosixPath('data/emotions/ravdess_audio/Audio_Speech_Actors_01-24/cache')}

Let me know

mogwai commented 4 years ago

Just a little trick here. Instead of specifying each key for the config, you can just:

config = AudioConfig(**audio_config)

Also python dataclasses can be turned into dict with the method from the dataclass module:

from dataclasses import asdict
asdict(config)

So this isn't really necessary and won't add anything you can't already do but thank you for the suggestion.