alex9490 / editor-on-fire

Automatically exported from code.google.com/p/editor-on-fire
Other
0 stars 0 forks source link

Add a wave form graph #132

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
There has been a lot of requests for this feature lately:  A graphical 
representation of the audio, such as what is displayed in audio editing tools 
like Audacity.

Original issue reported on code.google.com by raynebc on 26 Jul 2010 at 12:53

GoogleCodeExporter commented 8 years ago
The most common implementation is probably an time/amplitude graph.  I can try 
to put the logic together if I can figure out how to determine the amplitude of 
each audio sample in the file.

Original comment by raynebc on 26 Jul 2010 at 9:09

GoogleCodeExporter commented 8 years ago
I posted on the Allegro forum asking for details on how to accomplish this:
http://www.allegro.cc/forums/thread/604713

Original comment by raynebc on 27 Jul 2010 at 7:34

GoogleCodeExporter commented 8 years ago
The theory has been ironed out, I just need to know how to obtain decoded audio 
samples.

Original comment by raynebc on 29 Jul 2010 at 8:04

GoogleCodeExporter commented 8 years ago
I attempted to add this once before but my algorithm was so slow it was pretty 
much useless. I just decoded the whole song into memory (alogg provides a 
function for this) and used some math to find where the start and end of the 
viewable part of the waveform was. Then I went through the samples and drew 
vertical lines from the center to the amplitude of the samples (scaled to fit 
into the area I had set up for drawing it of course). The waveform looked good 
using this method but I couldn't get the algorithm to work fast enough for it 
to be usable.

I considered creating a huge BITMAP and pre-rendering the waveform but it was 
going to take too much memory. Doing the math now I come up with about 10MB per 
minute of audio if the height of the waveform view is 32 pixels and the each 
horizontal pixel represents 2 milliseconds of audio (the smallest amount 
viewable in EOF). That wouldn't be that bad I suppose.

Original comment by xander4j...@yahoo.com on 5 Aug 2010 at 12:18

GoogleCodeExporter commented 8 years ago
If the OGG was decoded into memory, it could make playback faster, because 
there would be less work involved with playing uncompressed PCM.  If you can 
point me in the right direction for decoding to memory, I could try to plan a 
means for creating a the wave form graph.  If we want, we might be able to get 
inspiration from other open source applications like Audacity, which does the 
waveform graph extremely well.

Original comment by raynebc on 5 Aug 2010 at 4:04

GoogleCodeExporter commented 8 years ago
The function to decode the OGG into memory is:

SAMPLE *alogg_create_sample_from_ogg(ALOGG_OGG *ogg);

SAMPLE is a type define by Allegro. Check the manual for details on this type.

I don't think OGG decoding is a bottleneck for EOF. Decoding an OGG in real 
time takes < 1% CPU time even on my old P4 2.8ghz. The real issues are software 
graphics rendering and other processing done by EOF.

Original comment by xander4j...@yahoo.com on 5 Aug 2010 at 9:58

GoogleCodeExporter commented 8 years ago
My main concern is that people will dislike the amount of memory consumed by 
storing the entire decoded OGG in memory, even if the waveform display is 
optional.  Perhaps we can compare decoding the entire OGG to memory versus just 
decoding one screen full at a time for the purpose of displaying the waveform.  
Since the OGG file itself is buffered to memory, seeking to the appropriate 
sample to begin decoding shouldn't post much of a delay to speak of.  This 
would drastically decrease the amount of memory needed to create the waveform 
in real time.

Maybe my math is wrong, but I'm not sure how you got 10MB per minute for a 
bitmap.  Here's what I'm seeing:

1 minute = 60000 ms
Each column is 2ms -> 30000 columns per minute

30000 columns per minute * 32 rows * 8 bits per pixel / 8 bits per byte = 
960000 bytes
30000 columns per minute * 32 rows * 2 bits per pixel / 8 bits per byte = 
240000 bytes
30000 columns per minute * 32 rows * 1 bit per pixel / 8 bits per byte = 120000 
bytes

Even if I goofed the math, there's no need to use 8 bit color depth on the 
waveform.  Monochrome should be fine for a basic graph, but 2 bit would allow 
for a nice peak+root mean square graph.  The root mean square would add 
considerable calculation time, but if it's performed once each time an OGG is 
loaded, it could be cached one way or another.

Original comment by raynebc on 6 Aug 2010 at 6:29

GoogleCodeExporter commented 8 years ago
You're right. Not sure where the extra '0' came from. Glad it won't take as 
much memory as I thought. I would stick with 8-bit because that is the lowest 
depth Allegro supports. We could write our own 2-bit or 4-bit rendering 
function but that would probably be slower than using Allegro's 8-bit renderer.

Memory usage is my main concern with decoding the whole OGG into memory. I 
don't like the idea of eating up 50+ megabytes of memory (more when we support 
multiple tracks) when there isn't that much to gain from it.

If this feature pans out I would probably go the route of generating the 
waveform once for each loaded OGG and saving it in the song folder. It should 
be fairly trivial to make a function that uses libvorbis functions to decode a 
bit of the OGG and generate the waveform graph without having to resort to 
decoding the entire thing into memory. This should be optional, though, and if 
it's not terribly slow we could probably get away without using cache files 
(caching would probably bring a few headaches of its own).

Original comment by xander4j...@yahoo.com on 6 Aug 2010 at 10:10

GoogleCodeExporter commented 8 years ago
We can probably get it working first, and then alter the logic to not decode 
into memory.  The memory use would be temporary, and multiple OGGs needn't be 
processed at once, so a ~50MB temporary memory usage total for creating one 
more more graphs isn't too bad.

But when we optimize it, we can make a modified version of 
alogg_create_sample_from_ogg() that returns the next 2ms worth of samples.  
Storing it as an 8 bit bitmap in memory is probably fine, it doesn't use much 
memory, but we could store it on disk as a 2 bit bitmap if we choose.  To make 
the cached graphs even smaller, they could be stored in PCX format, which 
natively supports 2 bit color depth and performs run length encoding 
compression, which may or may not make much difference.

Original comment by raynebc on 6 Aug 2010 at 11:18

GoogleCodeExporter commented 8 years ago
r276 adds some initial logic to build the graph.  For the time being, the min, 
peak and RMS amplitudes are tracked.  The OGG is decoded entirely into memory 
to create the data, but the decoded samples are released from memory afterward. 
 This logic, if it actually works, could easily be modified to just accept a 
SAMPLE structure containing (waveform->slicelength) number of decoded samples 
per loop iteration, which would avoid needing ~50MB of memory to decode the 
entire OGG into at once.  For temporary memory usage though, I don't think it 
would make much difference as any computer that can't temporarily spare 50ish 
MB of RAM have bigger problems to worry about.

Original comment by raynebc on 9 Aug 2010 at 12:29

GoogleCodeExporter commented 8 years ago
The next useful step is to probably write a function to render the graph to the 
editor window, taking the current zoom level into account.  Depending on 
Allegro's transparency, I don't know if it would be easier to draw transparent 
lines onto the editor window, or write the graph to a bitmap and have the 
entire bitmap superimposed over the editor window with transparency.

To improve rendering performance, each channel's maximum amplitude could 
probably be divided out of the waveform data values so that it doesn't have to 
be done each time the graph is rendered, unless the original data is needed for 
some reason.  Since the data is compiled in increments of 1ms worth of audio 
samples, I imagine that to avoid regenerating the graph each time the zoom 
level changes, the data would need to be interpolated so that each pixel on the 
graph's X axis uses a whole number of milliseconds worth of data, obtaining the 
mathematical mean values.

Original comment by raynebc on 9 Aug 2010 at 8:19

GoogleCodeExporter commented 8 years ago
I guess I should clarify that the 1ms intervals aren't hard-coded in, I just 
figured we'd use that for the sake of simplicity, since it may work a little 
more cleanly with zoom levels than 2ms.

Original comment by raynebc on 9 Aug 2010 at 9:14

GoogleCodeExporter commented 8 years ago
Since drawing the waveform in realtime may require enough math (square root 
function) to cause the playback to lag, another possibility would be to keep 
the 1ms waveform data, and every time the zoom level changes, have a temporary 
waveform structure be created that has all the data ready to display.  This 
should allow the waveform to scroll in realtime with the chart.

Original comment by raynebc on 15 Aug 2010 at 5:41

GoogleCodeExporter commented 8 years ago

Original comment by raynebc on 31 Aug 2010 at 8:29

GoogleCodeExporter commented 8 years ago
I actually wanted something like this a while back, and ended up getting it to 
render a pre-generated spectrogram behind the notes track, aligned with the 
zoom and scrolling, and a graph up above showing amplitude - again just from 
pre-generated data.  So it wasn't generated by EOF, but it was presented within 
it, and actually worked pretty well.  Obviously a spectrogram, with lots of 
FFTs, is more processor intensive than a simple wavevform diagram, but it's 
also much more useful.

I can take some screenshots if you want - obviously I wanted to actually have 
it included and not need to pregenerate the images, and I would love to see 
this implemented.  I might see if I can hack at it some more, but I'm glad to 
see that EOF is still very actively developed and developing.

Original comment by cincoden...@gmail.com on 8 Sep 2010 at 7:38

GoogleCodeExporter commented 8 years ago
Oh, also, perhaps I should introduce myself.  I'm 5of0 over on FretsonFire.net, 
and have only done a little bit of song editing, but EOF is by far the closest 
to fulfilling everything I want in an editor, and open-source, so I took to 
modifying it instead of writing my own.  Other than that, I'm just some guy who 
likes Frets on Fire :P

Original comment by cincoden...@gmail.com on 8 Sep 2010 at 7:42

GoogleCodeExporter commented 8 years ago
Sure, feel free to post or PM some code/screenshots.  My account name is the 
same at FoF-FF as it is here.  Since I'm not an expert at sound processing and 
the related theory behind it, I'd be happy to have any help that you can 
provide.

Original comment by raynebc on 8 Sep 2010 at 8:26

GoogleCodeExporter commented 8 years ago
r366 adds much of the functionality.  For now, I mapped F5 to toggle the 
waveform display on/off.  The waveform data itself might not be being generated 
correctly.  This will probably require delving further into how Allegro stores 
audio samples, signed, unsigned, etc.

Original comment by raynebc on 14 Sep 2010 at 11:11

GoogleCodeExporter commented 8 years ago
r373 fixes most of the broken logic and it will now render something that is 
close to correct.  The graph is being displaced slightly (probably by the AV 
delay value or something).

Original comment by raynebc on 18 Sep 2010 at 8:15

GoogleCodeExporter commented 8 years ago
The current logic is unable to correctly detect the name of the currently 
loaded OGG.  It might be beneficial to have eof_load_ogg() store the name of 
the OGG file that is being loaded.

Original comment by raynebc on 19 Sep 2010 at 1:17

GoogleCodeExporter commented 8 years ago
Either the waveform can be destroyed and recreated in the OGG loading 
functions, or they can just store the filename of the loaded OGG and the 
recreation logic will take place elsewhere.  The latter is probably best, as it 
will make it easier to implement a GUI for displaying the loaded OGG and other 
profiles that exist in the EOF project.

Currently, only the left channel data is displayed.  Perhaps some other 
features would be to display the right channel in the fretboard area or to 
display a graph for both channels (one on top of the other).  The latter would 
require eof_render_waveform_line() to be altered to accept the height of the 
waveform graph to be rendered, which should be pretty easy since it already 
allows the calling function to define the y coordinate of the graph.

Original comment by raynebc on 19 Sep 2010 at 8:54

GoogleCodeExporter commented 8 years ago
Currently, I've set up PART VOCALS to display a graph for each audio channel.  
Eventually, which graphs display and how could be a user preference.

The remaining issues to be resolved for this enhancement are:
1.  How to track changes in the loaded OGG file.  Storing the path of the 
loaded OGG still would be my preference.

2.  Allow the waveform to be rendered in the piano roll area to the left of the 
first beat marker.  I'll have to look into what's causing this.  I originally 
thought it was due to how I designed the rendering logic, but in PART VOCALS, 
the left channel's graph is allowed to render to the left of the first beat 
marker, so I'm more inclined to think EOF is writing over that area with a 
black rectangle.

3.  Find out why the graph isn't accurate with respect to time.

Original comment by raynebc on 20 Sep 2010 at 8:53

GoogleCodeExporter commented 8 years ago
r381 fixes most of the problems with the waveform graph.  The only error with 
the graph itself now is that it is 2% shorter than it needs to be because I am 
rounding 44.1 to 45 samples per millisecond (which is exactly a 2% 
discrepancy).  I will fix this and then the rest of this enhancement revolves 
around how to finish implementing the feature.

Original comment by raynebc on 21 Sep 2010 at 9:51

GoogleCodeExporter commented 8 years ago
r382 corrects the graph's representation of time

r383 allows the graph to be visible left of the first beat marker

r384 allows the name of the loaded OGG file to be tracked, allowing the 
waveform to be recreated appropriately

Now there are just a couple things needed to polish out the feature, such as a 
user interface for configuring the graph (ie. y axis position, height).  
Perhaps F5 can be set to toggle between a couple pre-defined graph views (such 
as scaling to fit the fretboard area, scaling to fit the entire editor window 
or scaling both channels' graphs in the editor window).

Original comment by raynebc on 22 Sep 2010 at 12:40

GoogleCodeExporter commented 8 years ago
Since this is almost completely finished, I'm raising the priority.

Original comment by raynebc on 22 Sep 2010 at 2:27

GoogleCodeExporter commented 8 years ago
Remaining things to resolve for this enhancement:

1. Provide a means to alter how the waveform is displayed (such as cycling 
through various presets using F5).

2. Ensure that an existing waveform is hidden and destroyed when a chart is 
loaded/imported.  Regenerating the graph after loading another OGG should be 
manual, because if people were going to sync using separated drum audio, they'd 
want to load the drum audio, create the graph and then maybe load a full mix of 
the song to chart with.

Original comment by raynebc on 23 Sep 2010 at 10:05

GoogleCodeExporter commented 8 years ago
It might be best to provide a dialog window for configuring the waveform, such 
as checkboxes for which channels are rendered, and a radio button for how they 
are rendered (selected channels rendered into height of the fretboard area or 
rendered into the height of the editor window).

I would also like to move the F5 key input detection so that it can be used 
during playback (for toggling on/off).  Creation of the waveform will only be 
allowed when the chart is paused, as it would definitely cause enough lag to 
desync the chart.

Since the waveform pointer is initialized to NULL on startup, it would probably 
be easiest to call eof_destroy_waveform() and set eof_waveform against NULL in 
eof_init_after_load().  This should ensure that the graph is destroyed when 
another chart is loaded/imported.

Original comment by raynebc on 27 Sep 2010 at 6:23

GoogleCodeExporter commented 8 years ago
Continued in r403.  Now a user interface can be designed for the user to 
specify whether to fit the graph into the fretboard or into the entire editor 
window, and which channels to render.

Original comment by raynebc on 27 Sep 2010 at 8:55

GoogleCodeExporter commented 8 years ago
r404 completes those remaining issues.  After the documentation is updated, 
this enhancement can be considered complete.

Original comment by raynebc on 28 Sep 2010 at 1:30

GoogleCodeExporter commented 8 years ago
It would be a nice addition if the waveform graph settings were maintained 
outside of the waveform structure, so that way each time the graph was created, 
the user wouldn't have to change the settings to his or her preference.  This 
can be achieved by defining a global renderlocation, renderleftchannel and 
renderrightchannel variable.  These variables can be updated by 
eof_menu_song_waveform_settings() and eof_render_waveform() will validate those 
settings and store them in the eof_waveform structure.

Original comment by raynebc on 28 Sep 2010 at 2:40

GoogleCodeExporter commented 8 years ago
Completed in r406.

Original comment by raynebc on 28 Sep 2010 at 7:18