Using nested reports in validphys appears to shift critical error from compile time to run time

wilsonmr commented 5 years ago

I've been playing with closure reports and noticed some strange behaviour which I think is unexpected, but perhaps I'm doing something stupid with the input. There might be a shorter minimal (not) working setup but I think this demonstrates the issue (I can't upload files here):

https://drive.google.com/open?id=14FwLAzdTFrmkyN9zTi93TGDMJQlfkdlD

If you run validphys test2.yaml there will be a critical error raised at compile time however if you run validphys test1.yaml there will be a critical error at run time. AFAICT The run cards should be equivalent, however test1.yaml has the nested report structure.

Side note: I'm suprised that there are issues with running this

Zaharid commented 5 years ago

The problem with test2.yaml is that template_text and template interact in a bad way and template_text is being preferred even though it is in an outer namespace. This is a bug in reportengine (not particularly easy to fix, but I'll have a look, maybe will just error out).

A workaround is to always use either template or template_text. So this works at the moment:

# This is a generic runcard which generates a report comparing a closure
# test to its underlying law
meta:
    title: I didn't change the title
    keywords: [Guilty]
    author: Lazy Person

closures:
    fit: {id: 180501-nh-002}
    pdf: {id: 180501-nh-002, label: "test closures"}
    theory:
      from_: fit
    theoryid:
      from_: theory
    speclabel: "closures"

dataset_report:
    meta: Null
    template_text: | 
        % Data-theory comparison for {@dataset_name@}
        # Absolute
        {@plot_fancy_dataspecs@}
        # Normalized
        {@datanorm plot_fancy_dataspecs@}

experiments:
    from_: fit

use_cuts: True 

dataspecs:
  - theoryid:
      from_: closures
    pdf:
      from_: closures
    fit:
      from_: closures
    speclabel:
      from_: closures

datanorm:
    normalize_to: data

template_text: |
  %NNPDF Report for fit {@ closures fit @}

  Experiment plots
  ----------------
  {@with matched_datasets_from_dataspecs@}
  [Detailed plots for dataset ' {@dataset_name@} ']({@dataset_report report@})
  {@endwith@}

actions_:
  - report(main=true)

Zaharid commented 5 years ago

The error in test1 (and in the modified version of test2 above) seems unrelated. Unfortunately matplotlib has the habit of throwing errors when running savefig, which makes it difficult to see what caused them.

Zaharid commented 5 years ago

The issue seems to be a regression of matplotlib. The runcard above works if I conda install matplotlib=2. Would be good to have a minimal reproducible example that we could report, but might be tricky.

Zaharid commented 5 years ago

These seem to give similar tracebacks but come from different places:

https://github.com/matplotlib/matplotlib/issues/10360 https://github.com/matplotlib/matplotlib/issues/11386 https://github.com/matplotlib/matplotlib/issues/6789

For the record, the issue we are seeing is:

----

Traceback (most recent call last):
  File "/home/zah/anaconda3/bin/validphys", line 11, in <module>
    load_entry_point('validphys', 'console_scripts', 'validphys')()
  File "/home/zah/nngit/nnpdf/validphys2/src/validphys/scripts/main.py", line 10, in main
    vp.main()
  File "/home/zah/nngit/nnpdf/validphys2/src/validphys/app.py", line 144, in main
    a.main()
  File "/home/zah/phd/reportengine/src/reportengine/app.py", line 374, in main
    self.run()
  File "/home/zah/nngit/nnpdf/validphys2/src/validphys/app.py", line 140, in run
    super().run()
  File "/home/zah/phd/reportengine/src/reportengine/app.py", line 359, in run
    rb.execute_sequential()
  File "/home/zah/phd/reportengine/src/reportengine/resourcebuilder.py", line 165, in execute_sequential
    *self.resolve_callargs(callspec))
  File "/home/zah/phd/reportengine/src/reportengine/resourcebuilder.py", line 174, in get_result
    return function.final_action(fres, **prepare_args)
  File "/home/zah/phd/reportengine/src/reportengine/figure.py", line 83, in savefiglist
    res.append(savefig(fig, paths=paths, output=output, suffix=suffix))
  File "/home/zah/phd/reportengine/src/reportengine/figure.py", line 66, in savefig
    fig.savefig(str(path), bbox_inches='tight')
  File "/home/zah/anaconda3/lib/python3.7/site-packages/matplotlib/figure.py", line 2097, in savefig
    self.canvas.print_figure(fname, **kwargs)
  File "/home/zah/anaconda3/lib/python3.7/site-packages/matplotlib/backend_bases.py", line 2049, in print_figure
    **kwargs)
  File "/home/zah/anaconda3/lib/python3.7/site-packages/matplotlib/backends/backend_agg.py", line 510, in print_png
    FigureCanvasAgg.draw(self)
  File "/home/zah/anaconda3/lib/python3.7/site-packages/matplotlib/backends/backend_agg.py", line 402, in draw
    self.figure.draw(self.renderer)
  File "/home/zah/anaconda3/lib/python3.7/site-packages/matplotlib/artist.py", line 50, in draw_wrapper
    return draw(artist, renderer, *args, **kwargs)
  File "/home/zah/anaconda3/lib/python3.7/site-packages/matplotlib/figure.py", line 1652, in draw
    renderer, self, artists, self.suppressComposite)
  File "/home/zah/anaconda3/lib/python3.7/site-packages/matplotlib/image.py", line 138, in _draw_list_compositing_images
    a.draw(renderer)
  File "/home/zah/anaconda3/lib/python3.7/site-packages/matplotlib/artist.py", line 50, in draw_wrapper
    return draw(artist, renderer, *args, **kwargs)
  File "/home/zah/anaconda3/lib/python3.7/site-packages/matplotlib/axes/_base.py", line 2604, in draw
    mimage._draw_list_compositing_images(renderer, self, artists)
  File "/home/zah/anaconda3/lib/python3.7/site-packages/matplotlib/image.py", line 138, in _draw_list_compositing_images
    a.draw(renderer)
  File "/home/zah/anaconda3/lib/python3.7/site-packages/matplotlib/artist.py", line 50, in draw_wrapper
    return draw(artist, renderer, *args, **kwargs)
  File "/home/zah/anaconda3/lib/python3.7/site-packages/matplotlib/axis.py", line 1185, in draw
    ticks_to_draw = self._update_ticks(renderer)
  File "/home/zah/anaconda3/lib/python3.7/site-packages/matplotlib/axis.py", line 1023, in _update_ticks
    tick_tups = list(self.iter_ticks())  # iter_ticks calls the locator
  File "/home/zah/anaconda3/lib/python3.7/site-packages/matplotlib/axis.py", line 967, in iter_ticks
    majorLocs = self.major.locator()
  File "/home/zah/anaconda3/lib/python3.7/site-packages/matplotlib/ticker.py", line 1985, in __call__
    return self.tick_values(vmin, vmax)
  File "/home/zah/anaconda3/lib/python3.7/site-packages/matplotlib/ticker.py", line 1993, in tick_values
    locs = self._raw_ticks(vmin, vmax)
  File "/home/zah/anaconda3/lib/python3.7/site-packages/matplotlib/ticker.py", line 1932, in _raw_ticks
    nbins = np.clip(self.axis.get_tick_space(),
  File "/home/zah/anaconda3/lib/python3.7/site-packages/matplotlib/axis.py", line 2159, in get_tick_space
    return int(np.floor(length / size))
ValueError: cannot convert float NaN to integer

----

[CRITICAL]: A critical error ocurred. This is likely due to one of the following reasons:

 - A bug in validphys.
 - Corruption of the provided resources (e.g. incorrect plotting files).
 - Cosmic rays hitting your CPU and altering the registers.

wilsonmr commented 5 years ago

ok I will see if I can reproduce this with something a bit easier to debug

Zaharid commented 5 years ago

I have been thinking about what the right thing is to do in the reportengine thing and it is not so clear to me. The current behaviour obviously doesn't make any sense and it is mostly asking for trouble.

At the moment when you have a config key and a production rule, that give the same symbol, the production rule always wins (we have produce_template(self, template_text) and report(template,...)). One might say that instead you do need to look at the innermost namespace level where a production rule is defined and compare it with the namespace level where the config key is defined, choosing the one that is innermost. But it is unclear to me this is such a good idea:

def produce_value(self, important_parameter, unimportant_parameter): 
    ....

def parse_value(self, value):
   ....

Ns0:
    value: 10

Ns1:
    important_parameter: 100

Ns2:
   unimportant_parameter: 1

Then these would do different things:

Ns1::Ns0::Ns2 value #produce
Ns1::Ns2::Ns0 value #parse

which might be surprising. Also it gets more complicated when considering that the parser methods can also depend on other keys (which was a mistake).

All in all, probably report should be changed to just throw an error.

Zaharid commented 5 years ago

As for the MPL error, I haven't touched it, but I did test a runcard that also computes all the data plots and didn't have any problem. It is not clear to me what triggers it.

Zaharid commented 5 years ago

I have found and reported the mpl bug here:

https://github.com/matplotlib/matplotlib/issues/12648

I have fixed it on master 0e26e7e23ba99992e8ac9febebd960c8184bbc53.

wilsonmr commented 5 years ago

Is this fixed now? Your mpl issue appears to have been closed

NNPDF / nnpdf

Using nested reports in validphys appears to shift critical error from compile time to run time #302