Closed GoogleCodeExporter closed 8 years ago
The most common implementation is probably an time/amplitude graph. I can try
to put the logic together if I can figure out how to determine the amplitude of
each audio sample in the file.
Original comment by raynebc
on 26 Jul 2010 at 9:09
I posted on the Allegro forum asking for details on how to accomplish this:
http://www.allegro.cc/forums/thread/604713
Original comment by raynebc
on 27 Jul 2010 at 7:34
The theory has been ironed out, I just need to know how to obtain decoded audio
samples.
Original comment by raynebc
on 29 Jul 2010 at 8:04
I attempted to add this once before but my algorithm was so slow it was pretty
much useless. I just decoded the whole song into memory (alogg provides a
function for this) and used some math to find where the start and end of the
viewable part of the waveform was. Then I went through the samples and drew
vertical lines from the center to the amplitude of the samples (scaled to fit
into the area I had set up for drawing it of course). The waveform looked good
using this method but I couldn't get the algorithm to work fast enough for it
to be usable.
I considered creating a huge BITMAP and pre-rendering the waveform but it was
going to take too much memory. Doing the math now I come up with about 10MB per
minute of audio if the height of the waveform view is 32 pixels and the each
horizontal pixel represents 2 milliseconds of audio (the smallest amount
viewable in EOF). That wouldn't be that bad I suppose.
Original comment by xander4j...@yahoo.com
on 5 Aug 2010 at 12:18
If the OGG was decoded into memory, it could make playback faster, because
there would be less work involved with playing uncompressed PCM. If you can
point me in the right direction for decoding to memory, I could try to plan a
means for creating a the wave form graph. If we want, we might be able to get
inspiration from other open source applications like Audacity, which does the
waveform graph extremely well.
Original comment by raynebc
on 5 Aug 2010 at 4:04
The function to decode the OGG into memory is:
SAMPLE *alogg_create_sample_from_ogg(ALOGG_OGG *ogg);
SAMPLE is a type define by Allegro. Check the manual for details on this type.
I don't think OGG decoding is a bottleneck for EOF. Decoding an OGG in real
time takes < 1% CPU time even on my old P4 2.8ghz. The real issues are software
graphics rendering and other processing done by EOF.
Original comment by xander4j...@yahoo.com
on 5 Aug 2010 at 9:58
My main concern is that people will dislike the amount of memory consumed by
storing the entire decoded OGG in memory, even if the waveform display is
optional. Perhaps we can compare decoding the entire OGG to memory versus just
decoding one screen full at a time for the purpose of displaying the waveform.
Since the OGG file itself is buffered to memory, seeking to the appropriate
sample to begin decoding shouldn't post much of a delay to speak of. This
would drastically decrease the amount of memory needed to create the waveform
in real time.
Maybe my math is wrong, but I'm not sure how you got 10MB per minute for a
bitmap. Here's what I'm seeing:
1 minute = 60000 ms
Each column is 2ms -> 30000 columns per minute
30000 columns per minute * 32 rows * 8 bits per pixel / 8 bits per byte =
960000 bytes
30000 columns per minute * 32 rows * 2 bits per pixel / 8 bits per byte =
240000 bytes
30000 columns per minute * 32 rows * 1 bit per pixel / 8 bits per byte = 120000
bytes
Even if I goofed the math, there's no need to use 8 bit color depth on the
waveform. Monochrome should be fine for a basic graph, but 2 bit would allow
for a nice peak+root mean square graph. The root mean square would add
considerable calculation time, but if it's performed once each time an OGG is
loaded, it could be cached one way or another.
Original comment by raynebc
on 6 Aug 2010 at 6:29
You're right. Not sure where the extra '0' came from. Glad it won't take as
much memory as I thought. I would stick with 8-bit because that is the lowest
depth Allegro supports. We could write our own 2-bit or 4-bit rendering
function but that would probably be slower than using Allegro's 8-bit renderer.
Memory usage is my main concern with decoding the whole OGG into memory. I
don't like the idea of eating up 50+ megabytes of memory (more when we support
multiple tracks) when there isn't that much to gain from it.
If this feature pans out I would probably go the route of generating the
waveform once for each loaded OGG and saving it in the song folder. It should
be fairly trivial to make a function that uses libvorbis functions to decode a
bit of the OGG and generate the waveform graph without having to resort to
decoding the entire thing into memory. This should be optional, though, and if
it's not terribly slow we could probably get away without using cache files
(caching would probably bring a few headaches of its own).
Original comment by xander4j...@yahoo.com
on 6 Aug 2010 at 10:10
We can probably get it working first, and then alter the logic to not decode
into memory. The memory use would be temporary, and multiple OGGs needn't be
processed at once, so a ~50MB temporary memory usage total for creating one
more more graphs isn't too bad.
But when we optimize it, we can make a modified version of
alogg_create_sample_from_ogg() that returns the next 2ms worth of samples.
Storing it as an 8 bit bitmap in memory is probably fine, it doesn't use much
memory, but we could store it on disk as a 2 bit bitmap if we choose. To make
the cached graphs even smaller, they could be stored in PCX format, which
natively supports 2 bit color depth and performs run length encoding
compression, which may or may not make much difference.
Original comment by raynebc
on 6 Aug 2010 at 11:18
r276 adds some initial logic to build the graph. For the time being, the min,
peak and RMS amplitudes are tracked. The OGG is decoded entirely into memory
to create the data, but the decoded samples are released from memory afterward.
This logic, if it actually works, could easily be modified to just accept a
SAMPLE structure containing (waveform->slicelength) number of decoded samples
per loop iteration, which would avoid needing ~50MB of memory to decode the
entire OGG into at once. For temporary memory usage though, I don't think it
would make much difference as any computer that can't temporarily spare 50ish
MB of RAM have bigger problems to worry about.
Original comment by raynebc
on 9 Aug 2010 at 12:29
The next useful step is to probably write a function to render the graph to the
editor window, taking the current zoom level into account. Depending on
Allegro's transparency, I don't know if it would be easier to draw transparent
lines onto the editor window, or write the graph to a bitmap and have the
entire bitmap superimposed over the editor window with transparency.
To improve rendering performance, each channel's maximum amplitude could
probably be divided out of the waveform data values so that it doesn't have to
be done each time the graph is rendered, unless the original data is needed for
some reason. Since the data is compiled in increments of 1ms worth of audio
samples, I imagine that to avoid regenerating the graph each time the zoom
level changes, the data would need to be interpolated so that each pixel on the
graph's X axis uses a whole number of milliseconds worth of data, obtaining the
mathematical mean values.
Original comment by raynebc
on 9 Aug 2010 at 8:19
I guess I should clarify that the 1ms intervals aren't hard-coded in, I just
figured we'd use that for the sake of simplicity, since it may work a little
more cleanly with zoom levels than 2ms.
Original comment by raynebc
on 9 Aug 2010 at 9:14
Since drawing the waveform in realtime may require enough math (square root
function) to cause the playback to lag, another possibility would be to keep
the 1ms waveform data, and every time the zoom level changes, have a temporary
waveform structure be created that has all the data ready to display. This
should allow the waveform to scroll in realtime with the chart.
Original comment by raynebc
on 15 Aug 2010 at 5:41
Original comment by raynebc
on 31 Aug 2010 at 8:29
I actually wanted something like this a while back, and ended up getting it to
render a pre-generated spectrogram behind the notes track, aligned with the
zoom and scrolling, and a graph up above showing amplitude - again just from
pre-generated data. So it wasn't generated by EOF, but it was presented within
it, and actually worked pretty well. Obviously a spectrogram, with lots of
FFTs, is more processor intensive than a simple wavevform diagram, but it's
also much more useful.
I can take some screenshots if you want - obviously I wanted to actually have
it included and not need to pregenerate the images, and I would love to see
this implemented. I might see if I can hack at it some more, but I'm glad to
see that EOF is still very actively developed and developing.
Original comment by cincoden...@gmail.com
on 8 Sep 2010 at 7:38
Oh, also, perhaps I should introduce myself. I'm 5of0 over on FretsonFire.net,
and have only done a little bit of song editing, but EOF is by far the closest
to fulfilling everything I want in an editor, and open-source, so I took to
modifying it instead of writing my own. Other than that, I'm just some guy who
likes Frets on Fire :P
Original comment by cincoden...@gmail.com
on 8 Sep 2010 at 7:42
Sure, feel free to post or PM some code/screenshots. My account name is the
same at FoF-FF as it is here. Since I'm not an expert at sound processing and
the related theory behind it, I'd be happy to have any help that you can
provide.
Original comment by raynebc
on 8 Sep 2010 at 8:26
r366 adds much of the functionality. For now, I mapped F5 to toggle the
waveform display on/off. The waveform data itself might not be being generated
correctly. This will probably require delving further into how Allegro stores
audio samples, signed, unsigned, etc.
Original comment by raynebc
on 14 Sep 2010 at 11:11
r373 fixes most of the broken logic and it will now render something that is
close to correct. The graph is being displaced slightly (probably by the AV
delay value or something).
Original comment by raynebc
on 18 Sep 2010 at 8:15
The current logic is unable to correctly detect the name of the currently
loaded OGG. It might be beneficial to have eof_load_ogg() store the name of
the OGG file that is being loaded.
Original comment by raynebc
on 19 Sep 2010 at 1:17
Either the waveform can be destroyed and recreated in the OGG loading
functions, or they can just store the filename of the loaded OGG and the
recreation logic will take place elsewhere. The latter is probably best, as it
will make it easier to implement a GUI for displaying the loaded OGG and other
profiles that exist in the EOF project.
Currently, only the left channel data is displayed. Perhaps some other
features would be to display the right channel in the fretboard area or to
display a graph for both channels (one on top of the other). The latter would
require eof_render_waveform_line() to be altered to accept the height of the
waveform graph to be rendered, which should be pretty easy since it already
allows the calling function to define the y coordinate of the graph.
Original comment by raynebc
on 19 Sep 2010 at 8:54
Currently, I've set up PART VOCALS to display a graph for each audio channel.
Eventually, which graphs display and how could be a user preference.
The remaining issues to be resolved for this enhancement are:
1. How to track changes in the loaded OGG file. Storing the path of the
loaded OGG still would be my preference.
2. Allow the waveform to be rendered in the piano roll area to the left of the
first beat marker. I'll have to look into what's causing this. I originally
thought it was due to how I designed the rendering logic, but in PART VOCALS,
the left channel's graph is allowed to render to the left of the first beat
marker, so I'm more inclined to think EOF is writing over that area with a
black rectangle.
3. Find out why the graph isn't accurate with respect to time.
Original comment by raynebc
on 20 Sep 2010 at 8:53
r381 fixes most of the problems with the waveform graph. The only error with
the graph itself now is that it is 2% shorter than it needs to be because I am
rounding 44.1 to 45 samples per millisecond (which is exactly a 2%
discrepancy). I will fix this and then the rest of this enhancement revolves
around how to finish implementing the feature.
Original comment by raynebc
on 21 Sep 2010 at 9:51
r382 corrects the graph's representation of time
r383 allows the graph to be visible left of the first beat marker
r384 allows the name of the loaded OGG file to be tracked, allowing the
waveform to be recreated appropriately
Now there are just a couple things needed to polish out the feature, such as a
user interface for configuring the graph (ie. y axis position, height).
Perhaps F5 can be set to toggle between a couple pre-defined graph views (such
as scaling to fit the fretboard area, scaling to fit the entire editor window
or scaling both channels' graphs in the editor window).
Original comment by raynebc
on 22 Sep 2010 at 12:40
Since this is almost completely finished, I'm raising the priority.
Original comment by raynebc
on 22 Sep 2010 at 2:27
Remaining things to resolve for this enhancement:
1. Provide a means to alter how the waveform is displayed (such as cycling
through various presets using F5).
2. Ensure that an existing waveform is hidden and destroyed when a chart is
loaded/imported. Regenerating the graph after loading another OGG should be
manual, because if people were going to sync using separated drum audio, they'd
want to load the drum audio, create the graph and then maybe load a full mix of
the song to chart with.
Original comment by raynebc
on 23 Sep 2010 at 10:05
It might be best to provide a dialog window for configuring the waveform, such
as checkboxes for which channels are rendered, and a radio button for how they
are rendered (selected channels rendered into height of the fretboard area or
rendered into the height of the editor window).
I would also like to move the F5 key input detection so that it can be used
during playback (for toggling on/off). Creation of the waveform will only be
allowed when the chart is paused, as it would definitely cause enough lag to
desync the chart.
Since the waveform pointer is initialized to NULL on startup, it would probably
be easiest to call eof_destroy_waveform() and set eof_waveform against NULL in
eof_init_after_load(). This should ensure that the graph is destroyed when
another chart is loaded/imported.
Original comment by raynebc
on 27 Sep 2010 at 6:23
Continued in r403. Now a user interface can be designed for the user to
specify whether to fit the graph into the fretboard or into the entire editor
window, and which channels to render.
Original comment by raynebc
on 27 Sep 2010 at 8:55
r404 completes those remaining issues. After the documentation is updated,
this enhancement can be considered complete.
Original comment by raynebc
on 28 Sep 2010 at 1:30
It would be a nice addition if the waveform graph settings were maintained
outside of the waveform structure, so that way each time the graph was created,
the user wouldn't have to change the settings to his or her preference. This
can be achieved by defining a global renderlocation, renderleftchannel and
renderrightchannel variable. These variables can be updated by
eof_menu_song_waveform_settings() and eof_render_waveform() will validate those
settings and store them in the eof_waveform structure.
Original comment by raynebc
on 28 Sep 2010 at 2:40
Completed in r406.
Original comment by raynebc
on 28 Sep 2010 at 7:18
Original issue reported on code.google.com by
raynebc
on 26 Jul 2010 at 12:53