problem with Huffman encoding step

jtautry commented 7 years ago

Hey guys, Huffman encoding isn't going to work with floating point values that can essentially be infinite. The idea of Huffman encoding is to take a finite domain of values and create a binary tree based on the frequency of their occurrence in the data being encoded. So for audio data, it works well when it is split into frequency bands, then lines, and ultimately quantized to produce a small set of finite values. But it won't work at all for a large series of values that could all be different.

Also, in order to build the frequency table in the first place, I have to make an initial pass through the file. I'm not going to be able to do that on a stream of data that's only passed in once. A true MP3 encoder doesn't have to do this because the code tables are predefined in the MP3 standard and based on the precision needed during quantization (which comes from the FFTs in the psychoacoustic model).

I feel like our choice to basically code our own audio compression is getting us into the weeds here. Keep in mind, we aren't being evaluated on any of the code we are writing. I'm starting to wonder if we would be better served by just starting with a complete implementation of LAME, modifying a single step based on what we know about our source files, and then comparing the results? Like this one: https://github.com/nwaldispuehl/java-lame

ericaddison commented 7 years ago

@jtautry I don't think floats are a problem; even though the values can be infinite, there are not an infinite number of values (for a couple reasons, in this case)! In the end it's just a bunch of bits, right? We have a couple of options to make things more straight forward, but I think the most obvious would be to convert back to integers. The values are stored in the wav file as integers to begin with, but we use the wavFile library to convert them to floats so we can use the signal processing stuff. If we convert from the float range we have, which is (-1..1), back to an integer range (0..?) (upper value depending on bit depth), then I think that should put the data in a more natural context for the Huffman encoder. We can add a separate quick class to do that if we want. Or just an array of bytes. That might be the most natural?

For the code table, how much space does it take to store the code information? Would it make sense to build a separate code tables for each window of data, or a group of windows, and then store that? You can have access to the entire stream at once, if you like. That will not really be a problem. I don't think it is anything we can't work around.

What do you think about that?

As far as reevaluating the project... I'm willing to discuss anything, but I don't think we're necessarily at the point of no return! (btw, I just got a prototype of the filter bank working in Matlab, just need to translate it over to java!)

ericaddison commented 7 years ago

Now that I think about it (for another 10 seconds :) ), I think a separate class to take in the windows of floats and convert that to one stream of ints or bytes is a good way to go. Then your Huffman code can just think in terms of a single (possibly multidimensional) array of ints (or bytes), instead of all this window/lines/subbands nonsense.

ericaddison commented 7 years ago

In that case I see the following state of affairs:

filterbank: prototype good, needs implementation in Java
MDCT: needs to be done. Will just grab an off-the-shelf implementation and adapt it for use here
new step dewindowizer/int maker: I can take care of this
Huffman: in good shape, confirm details of coding table details
OutputFormatter: needs some attention, but should not be a tricky part

Then we just need to run tests and plot results! I've been starting to push harder on this, and will keep ramping it up over the next couple weeks...

jtautry commented 7 years ago

Eric makes some good points. We can give it another week and see where we are at. The good news is that I have working code for the LAME implementation so if we end up needing to go that route it would be easy.

I would also suggest that we focus on encoding and save decoding for last. Simply because you can't "decode" an mp3 back to a wav anyway. Let's get the forward direction done first and then if we have time we can work on the reverse.

I will mark up the Huffman code to note where the changes need to be made to handle the incoming data. If you can pass bytes that's best, technically that's what it's doing now, just reading from a bitstream.

ericaddison commented 7 years ago

OK, I agree. If a week goes by and we are not significantly further along, I will concede and we can start thinking about a more analysis based report. Thanks!

JoshMusick commented 7 years ago

I can tackle the output formatter. Though, I may not have a chance to get too deep into it until after this coming weekend. The graphics class project has been a black-hole of my time. I'm sure your wife feels the same way about that project Eric.

ericaddison commented 7 years ago

@JoshMusick oh boy is that true! That class is making her question her self-worth! j/k

ericaddison / Algs_Term_Project

problem with Huffman encoding step #13