fat_llama is a Python package for upscaling audio files to FLAC or WAV formats using advanced audio processing techniques. It utilizes CUDA-accelerated calculations to enhance audio quality by upsampling and adding missing frequencies through FFT (Fast Fourier Transform), resulting in richer and more detailed audio.
(Note: For cpu verison please look at https://pypi.org/project/fat-llama-fftw/)
Install via pip:
pip install fat-llama
Note: This version works with CUDA 12.
Further need CUDA & CuPy properly installed: https://docs.cupy.dev/en/stable/install.html
Also, requires ffmpeg: https://support.audacityteam.org/basics/installing-ffmpeg
Note to install on older versions of CUDA and CuPy. You will need to download specific versions and install locally.
To install locally:
git clone <target_url>
cd fat_llama
pip install .
You can run the example provided in example.py:
from fat_llama.audio_fattener.feed import upscale
# Example call to the method
upscale(
input_file_path='input_test.mp3',
output_file_path='output_test.flac',
source_format='mp3',
target_format='flac',
max_iterations=300,
threshold_value=0.6,
target_bitrate_kbps=1400,
toggle_normalize=True,
toggle_autoscale=True,
toggle_adaptive_filter=True
)
input_file_path (str)
: Path to the input audio file. Mandatory.output_file_path (str)
: Path to the output processed audio file. Mandatory.source_format (str)
: Format of the input audio file (e.g., 'mp3', 'wav', 'ogg', 'flac').target_format (str)
: Format of the output audio file (e.g., 'flac', 'wav'). Default is 'flac'.max_iterations (int)
: Maximum number of iterations for IST. Default is 800.threshold_value (float)
: Threshold value for IST. Default is 0.6.target_bitrate_kbps (int)
: Target bitrate in kbps. Default is 1411.toggle_normalize (bool)
: Whether to normalize the audio. Default True.toggle_autoscale (bool)
: Whether to autoscale the audio based on the original audio. Default True.To run the example, execute the following command:
python example.py
This will upscale the MP3 file specified in the example and produce a FLAC file with full processing.
The upscaling process involves several steps:
FFT (Fast Fourier Transform) is used to transform the audio signal into the frequency domain. This allows for the identification and manipulation of specific frequency components. By applying a threshold in the frequency domain, we can keep significant frequencies and discard noise and add it to our upscaling data to add detail to upscaling frequencies.
The report titled "Fast Sparse Fourier Transformations for NMR Spectroscopy" by Badruddin Kamal, supervised by Thomas Huber and Alastair Rendall, 2015, provides a comprehensive understanding of sparse representations and their applications in signal processing. IST leverages the concepts from this report to add missing frequencies and enhance the audio quality by making it more detailed and rich. This is particularly useful in upscaling audio where some frequencies might be missing or congested.
ericzo - beyond link(https://soundcloud.com/ericzomusic/free-electro-trap-anthem-beyond)
All notable changes to this project will be documented in this file.
logging
from requirements to fix pip bug.analytics.py
analysis and spectorgram results.README.md
details.upscale_mp3_to_flac
method to upscale
to support multiple source formats.toggle_scale_amplitude
is False
.toggle_wiener_filter
, toggle_normalize
, toggle_equalize
, toggle_scale_amplitude
, toggle_gain_reduction
).upscale_mp3_to_flac
method with parameters for iterative soft thresholding (IST), gain reduction, and equalization.