biometrics / openbr

Open Source Biometrics, Face Recognition
www.openbiometrics.org
Other
2.85k stars 773 forks source link

Plotting #390

Closed caotto closed 9 years ago

caotto commented 9 years ago

I am making a couple figures using -plot, and I would describe the experience as infuriating. I am using this issue to document the problems that I have with it:

-plot outputs a bunch of figures that I have no interest in. For example, I want to plot a single CMC curve, and I get a 8 page pdf document. Later, I will want to plot a single ROC curve, and will also get an 8 page document

I want to plot a CMC, up to rank 20, and don't want a log scale on the x-axis. This is not possible, because even though cmcOptions exists, it only supports a small list of things, and adding things to it requires 1. knowing how to use ggplot, and knowing how to insert things into the R string building exercise being carried out in Plot.

I think generating .R files via string building is probably crazy. The code that writes these things looks like a mess, and we'd probably be better off having R functions that take arguments, rather than doing all this string building

y axis ticks for non-log scales are at 25% increments. This should be 10%

bklare-zz commented 9 years ago

Charles - just a few inline thoughts below bc while there is without question room for improvement, I think @bklein has added a lot of improved functionality recently. Perhaps we can narrow down from here the ones that are most needed after others comment as well, and then add ticket(s) in github for them.

On Jun 28, 2015 2:20 PM, "caotto" notifications@github.com wrote:

I am making a couple figures using -plot, and I would describe the experience as infuriating. I am using this issue to document the problems that I have with it:

-plot outputs a bunch of figures that I have no interest in. For example, I want to plot a single CMC curve, and I get a 8 page pdf document. Later, I will want to plot a single ROC curve, and will also get an 8 page document

I do not understand the preferred solution as everyone desires different metrics. Improvements were made before for the plots to be formatted such that they embed in a more aesthetic manner. In terms of multi page plots, what is the issue with grabbing the page of interest and using it as a figure?

I want to plot a CMC, up to rank 20, and don't want a log scale on the x-axis. This is not possible, because even though cmcOptions exists, it only supports a small list of things, and adding things to it requires 1. knowing how to use ggplot, and knowing how to insert things into the R string building exercise being carried out in Plot.

I agree that it would be nice for command line options to exist for setting different parameters.

I think generating .R files via string building is probably crazy. The code that writes these things looks like a mess, and we'd probably be better off having R functions that take arguments, rather than doing all this string building

I think better command line options should make this less of an issue as the R code should not need to be edited if sufficient flexibility exists. Of course, the biggest issue here, IMO, is that R is an atrocious language.

y axis ticks for non-log scales are at 25% increments. This should be 10%

Why? Is this a standard?

— Reply to this email directly or view it on GitHub.

caotto commented 9 years ago

My issue with the multi-page reports is, basically I typically want to make one figure at a time, showing different things, so having the ability to make just one plot would avoid the step of opening the pdf document, scrollign down until I find something relevant, then using that.

I think writing R code via stringbuilding, then executing that via Rscript is a maintainability issue vs. just having R code. Have you looked at this before: https://github.com/biometrics/openbr/blob/master/openbr/core/plot.cpp#L323 While yes, someone can spend the time to read and edit that if they devote the time to it, I think it's basically a mess, and if you think R is bad, the way we are currently using it is just making the situation worse.

RE: y axis ticks, say I have a line that hits the edge of the plot somewhere around mid-way between 75% and 50%, as-is, I get numbers at 50%, and 65%, and an unlabeld tick in between them, and get to do some mental math to say, oh my curve is at around 62.5%. The main thing I want to know is a ballpark for performance, so this is basically annoying, vs. having ticks at easier to automatically understand positions, such as every 10%. edit: Also, I think if you did a survey you'd find tick on numbers divisible by 10 to be drastically more common than ticks at 8ths

bhklein commented 9 years ago

Charles, I agree that -plot is overdue for some cleanup/an overhaul. In the short term I can look into Rcpp and add the options you've mentioned here, along with others you or anyone else may suggest in this issue.

Long term I think it's worth exploring options other than R, for instance QCustomPlot is a Qt plugin with some interesting features like interactive axis manipulation.

jklontz commented 9 years ago

We have to be a bit careful with the QCustomPlot solution, as we shouldn't pull in a GPL dependency by default. Agreed that a pure R solution ought to be the way forward.

bklare-zz commented 9 years ago

I think beyond some additional options for plotting at the command line, the desire for more interactive plotting environment is beyond the scope of OpenBR's plotting environment. To me, the intent of the plotting functionality is to provide a standard reporting mechanism when evaluating algorithms. If someone needs to explore the results in great depth, then environments such as matlab, octave, python, etc. are much better suited.

This could be a neat side project to have more interactive and exploitative plotting features within OpenBR. Given the fairly small list of active developers, I am not sure how much of a priority this is, though.

Charles and Ben - assuming most of the functionality discussed is agreed to be at of scope at this point, would an appropriate follow on action here be to (i) list out desired command line options that are fairly easy to implement and would be quite helpful, (ii) implement, and (iii) update documentation?

On Mon, Jun 29, 2015 at 9:01 AM, Josh Klontz notifications@github.com wrote:

We have to be a bit careful with the QCustomPlot solution, as we shouldn't pull in a GPL dependency by default. Agreed that a pure R solution ought to be the way forward.

— Reply to this email directly or view it on GitHub https://github.com/biometrics/openbr/issues/390#issuecomment-116650778.

caotto commented 9 years ago

I think:

  1. Default ticks for non-log y axis should be at 10% increments, not 1/8ths (I don't know why this would be controversial).
  2. Setting x-axis bounds for CMC plots (or really bounds for all axes for all plots) should be supported.
  3. setting log vs. linear scale should be supported for all plots for all axes.
  4. Refactoring the plot code is very desirable. I don't have a strong opinion about what is necessarily the best way of doing that (picking up a GPL dependency is bad of course), but I think there is agreement that the current approach is a problem--this may or may not be dealt with immediately.
bklare-zz commented 9 years ago

These look like good improvements to me. Ben - do you have any capacity to push these out? I think the Janus performers might similarly find this improvements useful.

Also, the following would be helpful for analyzing results that involve 10+ algorithms (e.g., during a parameter sweep):

  1. Option to export tabular results to csv file. For example, a csv file with the same results that are shown on the first page of the pdf. The issue I often experience is that when comparing a lot of algorithms, you can only see the first 8 or so on the pdf.

On Mon, Jun 29, 2015 at 11:01 AM, caotto notifications@github.com wrote:

I think:

1.

Default ticks for non-log y axis should be at 10% increments, not 1/8ths (I don't know why this would be controversial). 2.

Setting x-axis bounds for CMC plots (or really bounds for all axes for all plots) should be supported. 3.

setting log vs. linear scale should be supported for all plots for all axes. 4.

Refactoring the plot code is very desirable. I don't have a strong opinion about what is necessarily the best way of doing that (picking up a GPL dependency is bad of course), but I think there is agreement that the current approach is a problem--this may or may not be dealt with immediately.

— Reply to this email directly or view it on GitHub https://github.com/biometrics/openbr/issues/390#issuecomment-116719951.

bhklein commented 9 years ago

Yeah I think I can get these changes in sometime this week. I'll try to work in a way to specify what plots you want as well, but it will still default to the 9 page report.

jklontz commented 9 years ago

FWIW, we used to support an option where all the plots were written to different files. If that's ok with @caotto, perhaps that approach will allow a cleaner implementation?

bhklein commented 9 years ago

Yep that should be a bit easier to implement

caotto commented 9 years ago

Well, separate files would be preferable to the current situation I think.

bhklein commented 9 years ago

395 has the changes discussed here but doesn't address the underlying issue of the mess that plot.cpp has become.

My next step will be moving the bulk of the R code into functions in something like plot_utils.R, the R files generated from -plot can then just call source('plot_utils.R') and the only thing that needs to be done in plot.cpp is parsing the files and writing the function calls. This should make future edits to our R plots simpler since you will actually be able to visualize them in native R rather than the string building currently going on in plot.cpp.

Personally I like having the resulting R file from -plot and maybe others feel that way as well; it allows for customization beyond the standard -plot functionality that maybe you'd want for reports/publications.

Any suggestions? If no one thinks this is a terrible idea I'll go ahead with it sometime soon.

jklontz commented 9 years ago

makes sense to me, thanks!

caotto commented 9 years ago

I think this has more or less been addressed