annacrombie / plot

plot on the command line
MIT License
113 stars 9 forks source link

Make plot available as library #19

Closed Emanuesson closed 3 years ago

Emanuesson commented 3 years ago

Abstract

Within a recent project, I required a simple solution to plot data into the shell. Plot does offer exactly this functionality in a reasonable and good way. However, my intended application will not easily allow to integrate and run plot as a standalone executable. Therefore, I would like to adapt the build-system of plot to such needs and make it a shared-library which can be linked into other executables. This issue is about asking if this application and approach may be relevant for others as well and should be implemented to plot at all.

Intention and scope

The intention of these changes is to not affect the functionality of the plot-executable at all. Also, I have currently not the intention to add further functionality or extend interfaces of the project. My request is really just limited to offer the fundamental functionalities of plot, as a standalone shared library.

Specifically, the functions and types that should be exported by the library are defined in the following headers:

Disclaimer

I know that introducing fundamental requirements into a project is typically not welcomed, for good reason. Therefore, I would have no problem to just apply the changes on a fork of plot. By doing so, the original scope and functionality of plot would be entirely unaffected.

Foreseeable changes

annacrombie commented 3 years ago

I think this is mostly a good idea if it is done properly. I think what you are asking is basically copying all the definitions in the headers you listed into a plot.h, but I think this would be a poor long term choice. Several other cli utilities I have worked on have benefited from a well thought out separation of the application's functionality and its interface. In order to accomplish this for plot, I would like to tidy up the current interfaces in the files you mentioned into a smaller api, and have the cli interface only interact with plot that way. Since plot is a pretty simple program, I dont think this tidying would take much time at all, plus I was already planning on doing something similar after #18 is merged.

The only hard part will be deciding what the api looks like exactly. For instance, in the cli, we parse files into arrays of doubles and then copy those into a plot_data structure. Should the double parsing be included in the plot library or should that be part of the cli. Also, I'm sure library consumers would probably like to get the output of plot in a string, etc. It would be helpful if you could define more precisely the interface you need

E.g. something like this could do for a very basic interface:


#include <plot.h>

int main(void) 
{
    struct plot p;
    double arr[] = { 1, 2, 3, 4, 5, 6 };
    int arr_len = 6;

    plot_init(&p)
    plot_add_data(&p, arr, arr_len, plot_color_red);
    plot_print(&p, stdout);
}
Emanuesson commented 3 years ago

Thank you for the answer and the proposed interface.

About my use-case

I'm thinking about using plot in a test-setup, where the results of simulations are compared against expected results. And in case of deviations, I want to plot the mismatched part to stdout/stderr as a first hint on how the deviated part looks like (is it discontinuous, is it off by a factor x, etc.). The current interface (as provided by plots headers) does offer exactly that already. To be more precise, my application is very limited and can be summarized as the following: Only plotting ~< 50 data points in 2 data sets. Therefore, printing everything into a memory-stream is a possible workaround, so that it can be given into the plots current interface, as if it came from stdin. Something similar can be seen in this small test.

Regarding the interface-design

Input

I guess that most consumers of a plotting library will not have all their data string-formatted in a memory-stream, but as you suggested as array of floats/doubles. Therefore, this would be a good choice for input. Maybe in some applications it is relevant to feed a continuous stream of input into a plot (similar to the follow-option of the plot-application). However, I have no such application in mind and I would leave it out of a library-interface for the moment.

Output

There may be applications, where a stream as output is not the optimal choice. But, if I oversee the internals of display.c, currently everything is just printed into stdout. It would be a sufficient change to specify the output-stream as a member of struct plot and just replace printf by fprintf in display.c. That would also offer ways, to use one set of plot-data and plot it with different settings in multiple streams.

I agree that separating the library interface from the applications functionality offers more ways to optimize for convenience and performance.

annacrombie commented 3 years ago

I am nearing completion of this feature, which has also greatly expanded plot's flexibility in a lot of ways, and even increased performance by around 3x.

Here is an example of an api consumer:

#include <math.h>
#include <plot/plot.h>
#include <stdio.h>

struct sine_wave_ctx {
    double amp, freq;
    double x;
};

/* the input callback fills out with data.  
 * It must not fill in more values than out_buffer_length,
 * but may fill in fewer.  
 * It returns the number of values that were filled in.
 */

static uint32_t
sine_wave(void *_ctx, double *out, uint32_t out_buffer_length)
{
    struct sine_wave_ctx *ctx = _ctx;
    double v = sin(ctx->x) * ctx->amp;
    ctx->x += ctx->freq;
    out[0] = v;
    return 1;
}

int
main(void)
{
    struct plot *p = plot_alloc(24, 80, 12);

    plot_add_dataset(p, plot_color_cyan, NULL, 0, sine_wave,
        &(struct sine_wave_ctx){ .amp = 5, .freq = 0.2 });

    plot_fetch_until_full(p);
    plot_print(p, stdout);
    plot_free(p);
}

https://github.com/annacrombie/plot/blob/lib/examples/simple.c

As you can see, you feed data to plot through a callback which gets a user defined context. The file input callback is available in <plot/file_input.h>. Does this look reasonable to you?

Emanuesson commented 3 years ago

A, I see. By doing so, you also avoid copying every input or implementing multiple internal data-representations that would be optimal for different interfaces (follow a stream vs. statically sized array). I think, this is also a clever solution for my use-case. I will try to look into the details of your changes today and give you feedback soon. Best regards.

Emanuesson commented 3 years ago

First of all: good work. I'm quite impressed by the changes. The buffer based input-pipe is really the best way to process all input in a reliable way.

Results from testing

I made some tests with valgrind and randomizing the input-parameters of the plot and found that the changes may have introduced one problematic part.

It can be easily reproduced, by changing the example you provided by the following:

int
main(void)
{
    struct plot *p = plot_alloc(24, 80, 12);
    plot_fix_bounds(p, 100,150);

    plot_add_dataset(p, plot_color_cyan, NULL, 0, sine_wave,
        &(struct sine_wave_ctx){ .amp = 5, .freq = 0.2 });

    plot_fetch_until_full(p);
    plot_print(p, stdout);
    plot_free(p);
}

And let valgrind do its work: valgrind --tool=memcheck --track-origins=yes -s ./examples/simple

The problem is inside the function canvas_get. I think that in the version of this function before your changes, it was always ensured that x and y fullfil the following conditions:

uint8_t *
canvas_get(struct plot *p, uint16_t x, uint16_t y)
{
    if (x <  p->width && y < p->height)
        return &p->canvas[(x * p->height) + y];
    else
        return ...
}

That is now not always the case. I'm also not sure if this can already be checked before entering the function, to avoid executing the function, if the parameters went out of bounds.

Additional remarks

I just have some small additional questions regarding the code and they are mostly driven by subjective experience - so just ignore them if they bother you too much:

annacrombie commented 3 years ago

Thanks for the bug report! I think I have fixed it with ac02bff. I haven't merged the library pr yet because I am still testing it myself. A lot of code got changed... Please let me know if you find anything else. I am still thinking about how to implement testing

About your additional remarks

Why is unit32_t used where size_t could be

I like knowing the exact width of my integers, and so I fell into the habit of using the [u]int{8,16,32,64}_t types exclusively in my other C projects. Another thing about size_t that can be annoying is when porting code between platforms where it is a different size (e.g. 32 bit vs 64 bit x86), all printfs that print a size_t have to be changed between "%d" and "%ld". I should probably change it to size_t though, to reduce people's surprise when they look at the function signature

Why do many code-comments contain something like / L(" ... "); /

This is just a basic logging macro. I often use it to aid in printf-debugging and then comment it out when I no longer need it. It is a crude technique, and I will do a final pass before merge to remove these. Thanks for the reminder :smile:

Emanuesson commented 3 years ago

Thanks for the explanations. I performed again some tests (functional, as well as stability wise). Everything looks good so far and I'm absolutely satisfied with the current interface.

I have tought about tests as well. For my library clone I have used cunit. However, this is just another build-dependency then. And I have no strong oppinion, if any of the available C-Testframeworks outperforms the others or is even really necessary.

In my opinion, testing the library-interface of plot can really be a bunch of examples, executed with mesons unittest system (maybe even with valgrind on top). I think that mixing (demonstrator) examples and test examples of a library-integration doesn't work out in the longt-term. I would rather create a new structure of the test-folder and implement some test-examples therein. However, providing this infrastructure seems really out-of-scope of this issue.

Regarding the other points:

Another thing about size_t that can be annoying is when porting code between platforms where it is a different size (e.g. 32 bit vs 64 bit x86), all printfs that print a size_t have to be changed between "%d" and "%ld".

I must confess, I've never looked at it from that perspective. But indeed, it may be more reliably portable to directly go with 32bit as size, instead of identifying all the problems, when porting for the first time to a 32 bit platform. I have no strong oppinion about it, your explanation is reasonable to me and I think that this is the last problem, if someone expects to reliably plot data-sets that exceed the 32 bit limit in size. Therefore, I'm absoluetly fine with the uint32_t-types.