NNPDF / nnpdf

An open-source machine learning framework for global analyses of parton distributions.
https://docs.nnpdf.science/
GNU General Public License v3.0
28 stars 6 forks source link

Theory covmats with matched cuts #305

Closed RosalynLP closed 5 years ago

RosalynLP commented 5 years ago

I am trying to work on creating theory covmats with cuts that match those of the shift matrices here: https://vp.nnpdf.science/NlltmlyWRRqCtSeJbi1xIQ==/.

Currently the theory covmats take in


each_dataset_results_bytheory = collect('results_bytheoryids',
                                        ('experiments', 'experiment'))

so I have tried to alter them to take in each_dataset_results_matched = collect('results_bytheoryids', ['dataspecs_with_matched_cuts']).

However, I am getting the error


[ERROR]: Bad configuration encountered:
A parameter is required: dataset_input.
This is needed to process:
 - dataset
trough:
 - (('default_theory', 0),)
trough:
 - report
trough:
 - template_text
trough:
 - plot_thcorrmat_heatmap_custom
trough:
 - theory_corrmat_custom
trough:
 - theory_covmat_custom
trough:
 - covs_pt_prescrip
trough:
 - combine_by_type
trough:
 - each_dataset_results_matched

and I am not sure why. Is this the way I should be trying to do this?

Zaharid commented 5 years ago

ISTM that what we want is to first collect results over dataspecs instead of:

/n/nnpdf (prescrip2 %) $ validphys --help results_bytheoryids
results_bytheoryids

Defined in: reportengine.resourcebuilder

results_bytheoryids()

The result of `results` for each in ('theoryids',).

and then everything else should follow (either with a fair amount of duplicated functions that only call the old functions or with NNPDF/reportengine#63 ).

Zaharid commented 5 years ago

Note that we already have datspecs_results

Zaharid commented 5 years ago

I think the runcard should look something like this. Please note there is this annoying bug at the moment https://github.com/NNPDF/reportengine/issues/16

fit: XXX
use_cuts: "fromfit"
pdf: YYY
dataspecs:
  - theoryid: ZZZ
    experiments: ... # Probably has to go here for now. Sorry!
  - theoryid: XYXY
    experiments: ...
  ...

and then the actions would collect over:

matched_datasets_from_datapsecs::datasepecs_with_matched_cuts::datapsecs_results

or something like that. Note that this is the same as the shift matrix business.

Zaharid commented 5 years ago

Any progress with this? Any problems I could help with?

RosalynLP commented 5 years ago

Yes I'm still having problems with the runcards. Currently I have:


meta:
   author: Rosalyn Pearson
   keywords: [test, theory uncertainties, matched cuts]
   title: Testing theory covariance matrix with matched cuts
default_theory:
   - theoryid: 163

fivetheories: nobar

theoryids:
   - 163
   - 177
   - 176
   - 179
   - 174
#   - 180
#   - 173
#   - 175
#   - 178

dataspecs:
        - theoryid: 163
          speclabel: $(\xi_F,\xi_R)=(1,1)$
        - experiments:
        # Fixed target DIS
            - experiment: NMC
              datasets:
                - dataset: NMCPD
                - dataset: NMC
            - experiment: SLAC
              datasets:
                - dataset: SLACP
                - dataset: SLACD
            - experiment: BCDMS
              datasets:
                - dataset: BCDMSP
                - dataset: BCDMSD
            - experiment: NTVDMN
              datasets:
                - dataset: NTVNUDMN
                - dataset: NTVNBDMN
            - experiment: CHORUS
              datasets:
                - dataset: CHORUSNU
                - dataset: CHORUSNB
          # Combined HERA charm production cross-sections
            - experiment: HERAF2CHARM
              datasets:
                - dataset: HERAF2CHARM
          # HERA data
            - experiment: HERACOMB
              datasets:
                - dataset: HERACOMBNCEM 
                - dataset: HERACOMBNCEP460
                - dataset: HERACOMBNCEP575
                - dataset: HERACOMBNCEP820
                - dataset: HERACOMBNCEP920
                - dataset: HERACOMBCCEM 
                - dataset: HERACOMBCCEP 
          # F2bottom data
            - experiment: F2BOTTOM
              datasets: 
                - dataset: H1HERAF2B
                - dataset: ZEUSHERAF2B
            - experiment: ATLAS
              datasets:
                - dataset: ATLASWZRAP36PB
                - dataset: ATLASZHIGHMASS49FB
                - dataset: ATLASLOMASSDY11EXT
                - dataset: ATLASWZRAP11
                - dataset: ATLAS1JET11
                - dataset: ATLASZPT8TEVMDIST
                - dataset: ATLASZPT8TEVYDIST
                - dataset: ATLASTTBARTOT
                - dataset: ATLASTOPDIFF8TEVTRAPNORM
            - experiment: CMS
              datasets:
                - dataset: CMSWEASY840PB
                - dataset: CMSWMASY47FB
                - dataset: CMSWCHARMRAT
                - dataset: CMSDY2D11
                - dataset: CMSWMU8TEV
                - dataset: CMSJETS11
                - dataset: CMSTTBARTOT
                - dataset: CMSTOPDIFF8TEVTTRAPNORM
            - experiment: LHCb
              datasets:
                - dataset: LHCBZ940PB
                - dataset: LHCBZEE2FB
            - experiment: CDF
              datasets:
                - dataset: CDFZRAP
                - dataset: CDFR2KT
            - experiment: D0
              datasets:
                - dataset: D0ZRAP
                - dataset: D0WEASY
                - dataset: D0WMASY
        - theoryid: 177
          speclabel: $(\xi_F,\xi_R)=(2,1)$
        - experiments:
        # Fixed target DIS
            - experiment: NMC
              datasets:
                - dataset: NMCPD
                - dataset: NMC
            - experiment: SLAC
              datasets:
                - dataset: SLACP
                - dataset: SLACD
            - experiment: BCDMS
              datasets:
                - dataset: BCDMSP
                - dataset: BCDMSD
            - experiment: NTVDMN
              datasets:
                - dataset: NTVNUDMN
                - dataset: NTVNBDMN
            - experiment: CHORUS
              datasets:
                - dataset: CHORUSNU
                - dataset: CHORUSNB
          # Combined HERA charm production cross-sections
            - experiment: HERAF2CHARM
              datasets:
                - dataset: HERAF2CHARM
          # HERA data
            - experiment: HERACOMB
              datasets:
                - dataset: HERACOMBNCEM 
                - dataset: HERACOMBNCEP460
                - dataset: HERACOMBNCEP575
                - dataset: HERACOMBNCEP820
                - dataset: HERACOMBNCEP920
                - dataset: HERACOMBCCEM 
                - dataset: HERACOMBCCEP 
          # F2bottom data
            - experiment: F2BOTTOM
              datasets: 
                - dataset: H1HERAF2B
                - dataset: ZEUSHERAF2B
            - experiment: ATLAS
              datasets:
                - dataset: ATLASWZRAP36PB
                - dataset: ATLASZHIGHMASS49FB
                - dataset: ATLASLOMASSDY11EXT
                - dataset: ATLASWZRAP11
                - dataset: ATLAS1JET11
                - dataset: ATLASZPT8TEVMDIST
                - dataset: ATLASZPT8TEVYDIST
                - dataset: ATLASTTBARTOT
                - dataset: ATLASTOPDIFF8TEVTRAPNORM
            - experiment: CMS
              datasets:
                - dataset: CMSWEASY840PB
                - dataset: CMSWMASY47FB
                - dataset: CMSWCHARMRAT
                - dataset: CMSDY2D11
                - dataset: CMSWMU8TEV
                - dataset: CMSJETS11
                - dataset: CMSTTBARTOT
                - dataset: CMSTOPDIFF8TEVTTRAPNORM
            - experiment: LHCb
              datasets:
                - dataset: LHCBZ940PB
                - dataset: LHCBZEE2FB
            - experiment: CDF
              datasets:
                - dataset: CDFZRAP
                - dataset: CDFR2KT
            - experiment: D0
              datasets:
                - dataset: D0ZRAP
                - dataset: D0WEASY
                - dataset: D0WMASY
        - theoryid: 176
          speclabel: $(\xi_F,\xi_R)=(0.5,1)$
        - experiments:
        # Fixed target DIS
            - experiment: NMC
              datasets:
                - dataset: NMCPD
                - dataset: NMC
            - experiment: SLAC
              datasets:
                - dataset: SLACP
                - dataset: SLACD
            - experiment: BCDMS
              datasets:
                - dataset: BCDMSP
                - dataset: BCDMSD
            - experiment: NTVDMN
              datasets:
                - dataset: NTVNUDMN
                - dataset: NTVNBDMN
            - experiment: CHORUS
              datasets:
                - dataset: CHORUSNU
                - dataset: CHORUSNB
          # Combined HERA charm production cross-sections
            - experiment: HERAF2CHARM
              datasets:
                - dataset: HERAF2CHARM
          # HERA data
            - experiment: HERACOMB
              datasets:
                - dataset: HERACOMBNCEM 
                - dataset: HERACOMBNCEP460
                - dataset: HERACOMBNCEP575
                - dataset: HERACOMBNCEP820
                - dataset: HERACOMBNCEP920
                - dataset: HERACOMBCCEM 
                - dataset: HERACOMBCCEP 
          # F2bottom data
            - experiment: F2BOTTOM
              datasets: 
                - dataset: H1HERAF2B
                - dataset: ZEUSHERAF2B
            - experiment: ATLAS
              datasets:
                - dataset: ATLASWZRAP36PB
                - dataset: ATLASZHIGHMASS49FB
                - dataset: ATLASLOMASSDY11EXT
                - dataset: ATLASWZRAP11
                - dataset: ATLAS1JET11
                - dataset: ATLASZPT8TEVMDIST
                - dataset: ATLASZPT8TEVYDIST
                - dataset: ATLASTTBARTOT
                - dataset: ATLASTOPDIFF8TEVTRAPNORM
            - experiment: CMS
              datasets:
                - dataset: CMSWEASY840PB
                - dataset: CMSWMASY47FB
                - dataset: CMSWCHARMRAT
                - dataset: CMSDY2D11
                - dataset: CMSWMU8TEV
                - dataset: CMSJETS11
                - dataset: CMSTTBARTOT
                - dataset: CMSTOPDIFF8TEVTTRAPNORM
            - experiment: LHCb
              datasets:
                - dataset: LHCBZ940PB
                - dataset: LHCBZEE2FB
            - experiment: CDF
              datasets:
                - dataset: CDFZRAP
                - dataset: CDFR2KT
            - experiment: D0
              datasets:
                - dataset: D0ZRAP
                - dataset: D0WEASY
                - dataset: D0WMASY
        - theoryid: 179
          speclabel: $(\xi_F,\xi_R)=(1,2)$ 
        - experiments:
        # Fixed target DIS
            - experiment: NMC
              datasets:
                - dataset: NMCPD
                - dataset: NMC
            - experiment: SLAC
              datasets:
                - dataset: SLACP
                - dataset: SLACD
            - experiment: BCDMS
              datasets:
                - dataset: BCDMSP
                - dataset: BCDMSD
            - experiment: NTVDMN
              datasets:
                - dataset: NTVNUDMN
                - dataset: NTVNBDMN
            - experiment: CHORUS
              datasets:
                - dataset: CHORUSNU
                - dataset: CHORUSNB
          # Combined HERA charm production cross-sections
            - experiment: HERAF2CHARM
              datasets:
                - dataset: HERAF2CHARM
          # HERA data
            - experiment: HERACOMB
              datasets:
                - dataset: HERACOMBNCEM 
                - dataset: HERACOMBNCEP460
                - dataset: HERACOMBNCEP575
                - dataset: HERACOMBNCEP820
                - dataset: HERACOMBNCEP920
                - dataset: HERACOMBCCEM 
                - dataset: HERACOMBCCEP 
          # F2bottom data
            - experiment: F2BOTTOM
              datasets: 
                - dataset: H1HERAF2B
                - dataset: ZEUSHERAF2B
            - experiment: ATLAS
              datasets:
                - dataset: ATLASWZRAP36PB
                - dataset: ATLASZHIGHMASS49FB
                - dataset: ATLASLOMASSDY11EXT
                - dataset: ATLASWZRAP11
                - dataset: ATLAS1JET11
                - dataset: ATLASZPT8TEVMDIST
                - dataset: ATLASZPT8TEVYDIST
                - dataset: ATLASTTBARTOT
                - dataset: ATLASTOPDIFF8TEVTRAPNORM
            - experiment: CMS
              datasets:
                - dataset: CMSWEASY840PB
                - dataset: CMSWMASY47FB
                - dataset: CMSWCHARMRAT
                - dataset: CMSDY2D11
                - dataset: CMSWMU8TEV
                - dataset: CMSJETS11
                - dataset: CMSTTBARTOT
                - dataset: CMSTOPDIFF8TEVTTRAPNORM
            - experiment: LHCb
              datasets:
                - dataset: LHCBZ940PB
                - dataset: LHCBZEE2FB
            - experiment: CDF
              datasets:
                - dataset: CDFZRAP
                - dataset: CDFR2KT
            - experiment: D0
              datasets:
                - dataset: D0ZRAP
                - dataset: D0WEASY
                - dataset: D0WMASY
        - theoryid: 174
          speclabel: $(\xi_F,\xi_R)=(1,0.5)$
        - experiments:
        # Fixed target DIS
            - experiment: NMC
              datasets:
                - dataset: NMCPD
                - dataset: NMC
            - experiment: SLAC
              datasets:
                - dataset: SLACP
                - dataset: SLACD
            - experiment: BCDMS
              datasets:
                - dataset: BCDMSP
                - dataset: BCDMSD
            - experiment: NTVDMN
              datasets:
                - dataset: NTVNUDMN
                - dataset: NTVNBDMN
            - experiment: CHORUS
              datasets:
                - dataset: CHORUSNU
                - dataset: CHORUSNB
          # Combined HERA charm production cross-sections
            - experiment: HERAF2CHARM
              datasets:
                - dataset: HERAF2CHARM
          # HERA data
            - experiment: HERACOMB
              datasets:
                - dataset: HERACOMBNCEM 
                - dataset: HERACOMBNCEP460
                - dataset: HERACOMBNCEP575
                - dataset: HERACOMBNCEP820
                - dataset: HERACOMBNCEP920
                - dataset: HERACOMBCCEM 
                - dataset: HERACOMBCCEP 
          # F2bottom data
            - experiment: F2BOTTOM
              datasets: 
                - dataset: H1HERAF2B
                - dataset: ZEUSHERAF2B
            - experiment: ATLAS
              datasets:
                - dataset: ATLASWZRAP36PB
                - dataset: ATLASZHIGHMASS49FB
                - dataset: ATLASLOMASSDY11EXT
                - dataset: ATLASWZRAP11
                - dataset: ATLAS1JET11
                - dataset: ATLASZPT8TEVMDIST
                - dataset: ATLASZPT8TEVYDIST
                - dataset: ATLASTTBARTOT
                - dataset: ATLASTOPDIFF8TEVTRAPNORM
            - experiment: CMS
              datasets:
                - dataset: CMSWEASY840PB
                - dataset: CMSWMASY47FB
                - dataset: CMSWCHARMRAT
                - dataset: CMSDY2D11
                - dataset: CMSWMU8TEV
                - dataset: CMSJETS11
                - dataset: CMSTTBARTOT
                - dataset: CMSTOPDIFF8TEVTTRAPNORM
            - experiment: LHCb
              datasets:
                - dataset: LHCBZ940PB
                - dataset: LHCBZEE2FB
            - experiment: CDF
              datasets:
                - dataset: CDFZRAP
                - dataset: CDFR2KT
            - experiment: D0
              datasets:
                - dataset: D0ZRAP
                - dataset: D0WEASY
                - dataset: D0WMASY
#        - theoryid: 180
#          speclabel: $(\xi_F,\xi_R)=(2,2)$ 
#        - theoryid: 173
#          speclabel: $(\xi_F,\xi_R)=(0.5,0.5)$
#        - theoryid: 175
#          speclabel: $(\xi_F,\xi_R)=(2,0.5)$   
#        - theoryid: 178
#          speclabel: $(\xi_F,\xi_R)=(0.5,2)$

normalize_to: 1

use_cuts: 'fromfit'
fit: NNPDF31_nlo_as_0118_1000

pdf:
  from_: fit

#template_text: |
#
#   {@with default_theory@}
#
#   {@plot_thcorrmat_heatmap_custom@}
#
#   {@endwith@}

actions_:
#  - report(main=true)
   - matched_datasets_from_dataspecs::dataspecs_with_matched_cuts::dataspecs_results plot_thcorrmat_heatmap_custom

and I am getting the error


[ERROR]: Bad configuration encountered:
A parameter is required: theoryid.
This is needed to process:
 - experiments
trough:
 - dataspecs
trough:
 - matched_datasets_from_dataspecs
trough:
 - ()
trough:
 - plot_thcorrmat_heatmap_custom
Maybe you mistyped theoryid in one of the following keys?
 - theoryids
 - fivetheories
RosalynLP commented 5 years ago

I also don't really understand whether this action I am doing is the right thing - I haven't yet altered anything in the code either as I am not really able to debug without getting a basic runcard to work.

RosalynLP commented 5 years ago

sorry that was an accident

Zaharid commented 5 years ago

Note the runcard above is wrong it that it has the structure:

dataspecs: [
{theoryid: ...},
{experiments: ...},
{theoryid: ...},
{experiments: ...},
...
]

rather that:

dataspecs: [
{experimnts: ..., theoryid: ...},
{experimnts: ..., theoryid: ...},
...
]

which is what the error message is telling you.

RosalynLP commented 5 years ago

I don't understand sorry, I tried taking the '-' away from the start of "experiments" but this didn't help

Zaharid commented 5 years ago

What does didn't help mean? I don't think it can give the same error.

RosalynLP commented 5 years ago

Ah, initially I left one with a dash by accident but now it says


[ERROR]: Bad configuration encountered:
A parameter is required: dataspecs_results.
This is needed to process:
 - (('matched_datasets_from_dataspecs', 0), ('dataspecs_with_matched_cuts', 0))
trough:
 - plot_thcorrmat_heatmap_custom
Maybe you mistyped dataspecs_results in one of the following keys?
 - dataspecs
Zaharid commented 5 years ago

This is because datapsecs_results is not something you are supposed to expand namespaces over, but rather something you are supposed to collect over (my earlier message wasn't all that clear in that regard). However it should be easy enough to look at how matched_datasets_shift_matrix works and to the equivalent thing. Note that pretty much the only change is to call male_scale_covmat instead of computing the shifts.

RosalynLP commented 5 years ago

I'm really confused, in that case what do I put in the runcard? What is wrong with teh current runcard?

Zaharid commented 5 years ago

Have a look at this runcard:

https://vp.nnpdf.science/NlltmlyWRRqCtSeJbi1xIQ==/input/runcard.yaml

and and the corresponding code and try to work out how things get passed around (maybe run it with --debug). Btw it is quite likely that it is affected by the reportengine bug and all the differences are due to changing the pdf...

RosalynLP commented 5 years ago

Also we don't want to call make_scale_var_covmat right? Because that won't correlate between process types in the correct way. Ultimately we want to call theory_covmat_custom but this is not easily equatable with matched_datasets_shift_matrix, or at least I don't see how to write an equivalent (this is what I was trying to do earlier).

RosalynLP commented 5 years ago

Sorry Zahari, this is the runcard I have been looking at most of the day and I just really don't understand it and can't get it to work properly for some reason

RosalynLP commented 5 years ago

I just don't understand how to extend it to the point prescription case, I don't think it is an obvious extension

Zaharid commented 5 years ago

Incidentally ISTM that the runcard works well, which makes the bug in re even more confusing.

RosalynLP commented 5 years ago

Wait what, the runcard I pasted above?

Zaharid commented 5 years ago

The one with the shift plots.

RosalynLP commented 5 years ago

Ah no I mean I think I understand how the shift plots work but I just am having difficulty doing an equivalent because

a) I don't understand how to adjust the runcard b) I am not sure what to feed in to which new functions. What I am attempting is

matched_dataspecs_dataspecs_results = collect('dataspecs_results', ['dataspecs_with_matched_cuts'])

matched_datasets_matched_dataspecs_dataspecs_results = collect('matched_dataspecs_dataspecs_results', ['matched_datasets_from_dataspecs'])

Then writing a new combine_by_type which takes matched_datasets_matched_dataspecs_dataspecs_results rather than each_dataset_results_bytheorybut has no other changes.

Is this correct? Is there any part of this which is wrong?

Zaharid commented 5 years ago

ISTM that everything could be adapted more or less easily (but not trivially) by changing the namespaces the various actions collect over. E.g. this

results_bytheoryids = collect(results,('theoryids',))
each_dataset_results_bytheory = collect('results_bytheoryids', ('experiments', 'experiment'))

could become:

results_bytheoryids = collect(results,('dataspecs_with_matched_cuts',))
each_dataset_results_bytheory = collect('results_bytheoryids', ('matched_datasets_from_dataspecs'))

and then maybe you'll need some function to get the right dataframe index (I wrote the functionality inside some other provider).

Zaharid commented 5 years ago

@RosalynLP Yes, what you are doing seems like what I said.

RosalynLP commented 5 years ago

OK great but I keep getting this problem:


[ERROR]: Bad configuration encountered:
A parameter is required: dataset_input.
This is needed to process:
 - commondata
trough:
 - report
trough:
 - template_text
trough:
 - plot_thcorrmat_heatmap_custom
trough:
 - theory_corrmat_custom
trough:
 - theory_covmat_custom
trough:
 - covs_pt_prescrip
trough:
 - combine_by_type
trough:
 - process_lookup
trough:
 - commondata_experiments

Initially I had

#commondata_experiments = collect('commondata', ['experiments', 'experiment'])

and I tried changing it to

commondata_experiments = collect('commondata',
                                 ('matched_datasets_from_dataspecs',))

but I still get the issue because of commondataitself.

RosalynLP commented 5 years ago

I only need the names of the experiments for this so I could take it from any dataspec but I am not sure how to do the syntax for this

RosalynLP commented 5 years ago

OK I did this instead


commondata_experiments_sub = collect('commondata', ['dataspecs_with_matched_cuts'])
commondata_experiments = collect('commondata_experiments_sub',['matched_datasets_from_dataspecs'])
RosalynLP commented 5 years ago

@Zaharid when you say "and then maybe you'll need some function to get the right dataframe index (I wrote the functionality inside some other provider)." I presume what you mean is the fact experiments_indexdoesn't work and gives the error


[ERROR]: Bad configuration encountered:
A parameter is required: experiments.
This is needed to process:
 - report
trough:
 - template_text
trough:
 - plot_thcorrmat_heatmap_custom
trough:
 - theory_corrmat_custom
trough:
 - theory_covmat_custom
trough:
 - experiments_index

but I don't understand what the statement "I wrote the functionality inside some other provider" means - what are the different providers? So the issue is experiments_index loads in the experiments before the cuts have been matched or something? What is the correct input rather than experimentsto this kind of function?

RosalynLP commented 5 years ago

So we want to take in only the datasets which are mutual, i.e. those in matched_datasets_from_dataspecs?

RosalynLP commented 5 years ago

@Zaharid this whole thing makes no sense to me, even if I calculate the theory covmats using this runcard and matched_datsets_from_dataspecs, it is taking the dataspecs to be the different scale varied dataspecs, not the two dataspecs for NLO and NNLO with NNPDF3.1. So I somehow want to have two kinds of groupings in the runcard, one for the scale varied dataspecs, and one for the shift dataspecs. We then need the shift dataspecs to do as your functions already do and compute the shift matrix, and we need the other dataspecs to do the theory covmat stuff which previously existed. But we want to do all that just for the points belonging to the matched datasets from the OTHER (shift) dataspecs. Do you know how to separate these two things?

Zaharid commented 5 years ago

On Wed, Oct 17, 2018 at 4:24 PM RosalynLP notifications@github.com wrote:

@Zaharid https://github.com/Zaharid this whole thing makes no sense to me, even if I calculate the theory covmats using this runcard and matched_datsets_from_dataspecs, it is taking the dataspecs to be the different scale varied dataspecs, not the two dataspecs for NLO and NNLO with NNPDF3.1. So I somehow want to have two kinds of groupings in the runcard, one for the scale varied dataspecs, and one for the shift dataspecs. We then need the shift dataspecs to do as your functions already do and compute the shift matrix, and we need the other dataspecs to do the theory covmat stuff which previously existed. But we want to do all that just for the points belonging to the matched datasets from the OTHER (shift) dataspecs. Do you know how to separate these two things?

This is can be solved with various kinds of namespaces:

shiftconfig: dataspecs:

thcovconfig: dataspecs:

TODO: find better names

shift_mat_for_comparison = collect('shift_matrix_whatever_was_called', ['shiftconfig']) th_mat_for_comparison = collect('thcovmat_custom_whatever', ['thcovmatconfig'])

def do_some_comparison(shift_mat_for_comparison, th_mat_for_comparison):

because collect always returns a list

shift_mat = th_mat_for_comparison[0]
th_mat = th_mat_for_comparison[0]
...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NNPDF/nnpdf/issues/305#issuecomment-430673533, or mute the thread https://github.com/notifications/unsubscribe-auth/AFabUnVxiSfzygCm5oJvSNf0OtadBvTtks5ul0uogaJpZM4Xd-Z5 .

Zaharid commented 5 years ago

mmm probably the fact that collect always returns a list is annoying enough to justify a collect_one or somesuch. Anyhow, lets get the thcovmat done first!

On Wed, Oct 17, 2018 at 4:51 PM Zahari Dim zaharid@gmail.com wrote:

On Wed, Oct 17, 2018 at 4:24 PM RosalynLP notifications@github.com wrote:

@Zaharid https://github.com/Zaharid this whole thing makes no sense to me, even if I calculate the theory covmats using this runcard and matched_datsets_from_dataspecs, it is taking the dataspecs to be the different scale varied dataspecs, not the two dataspecs for NLO and NNLO with NNPDF3.1. So I somehow want to have two kinds of groupings in the runcard, one for the scale varied dataspecs, and one for the shift dataspecs. We then need the shift dataspecs to do as your functions already do and compute the shift matrix, and we need the other dataspecs to do the theory covmat stuff which previously existed. But we want to do all that just for the points belonging to the matched datasets from the OTHER (shift) dataspecs. Do you know how to separate these two things?

This is can be solved with various kinds of namespaces:

shiftconfig: dataspecs:

  • ... #nlo vs nnlo

thcovconfig: dataspecs:

  • ... # bazillion theories

TODO: find better names

shift_mat_for_comparison = collect('shift_matrix_whatever_was_called', ['shiftconfig']) th_mat_for_comparison = collect('thcovmat_custom_whatever', ['thcovmatconfig'])

def do_some_comparison(shift_mat_for_comparison, th_mat_for_comparison):

because collect always returns a list

shift_mat = th_mat_for_comparison[0]
th_mat = th_mat_for_comparison[0]
...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NNPDF/nnpdf/issues/305#issuecomment-430673533, or mute the thread https://github.com/notifications/unsubscribe-auth/AFabUnVxiSfzygCm5oJvSNf0OtadBvTtks5ul0uogaJpZM4Xd-Z5 .

RosalynLP commented 5 years ago

OK so when are the matched cuts being applied? Before this collect function presumably? In which case how do they know which dataspsecs to pick? Or does the fact you collect over a certain namespace have an effect? I basically still don't see that this will make the theory covmat have the matched cuts for the shift comparison.

RosalynLP commented 5 years ago

NNPDF/nnpdf#309

RosalynLP commented 5 years ago

I don't understand how the collect function is working here, for the shift matrix the workflow is essentially

matched_dataspecs_dataset_prediction_shift = collect(
    'dataspecs_dataset_prediction_shift', ['matched_datasets_from_dataspecs'])

def matched_datasets_shift_matrix(matched_dataspecs_dataset_prediction_shift):
    """Priduce a matrix out of the outer product of
    ``dataspecs_dataset_prediction_shift``. The matrix will be a
    pandas DataFrame, indexed similarly to ``experiments_index``."""
    all_shifts = np.concatenate(
        [val.shifts for val in matched_dataspecs_dataset_prediction_shift])
    mat = np.outer(all_shifts, all_shifts)
    #build index
    expnames = np.concatenate([
        np.full(len(val.shifts), val.experiment_name, dtype=object)
        for val in matched_dataspecs_dataset_prediction_shift
    ])
    dsnames = np.concatenate([
        np.full(len(val.shifts), val.dataset_name, dtype=object)
        for val in matched_dataspecs_dataset_prediction_shift
    ])
    point_indexes = np.concatenate([
        np.arange(len(val.shifts))
        for val in matched_dataspecs_dataset_prediction_shift
    ])

    index = pd.MultiIndex.from_arrays(
        [expnames, dsnames, point_indexes],
        names=["Experiment name", "Dataset name", "Point"])

    return pd.DataFrame(mat, columns=index, index=index)

shift_mat_for_comparison = collect('matched_datasets_shift_matrix', ['shiftconfig'])

So I don't see how this works: first the matched_datasets_from_dataspecs won't know which dataspecs to use, right, then even if that works you should end up with a list of matrices collected over the two theories NLO and NNLO or something, which makes no sense to me.

And then as for theories it won't know which dataspecs to use to evaluate the theory covmat, you'll end up with it using at best the mutual cuts from the scale varied theories, which aren't the same as for the NLO/NNLO mutual cuts, and then you will collect over all the theories, so end up with a list of matrices. But as far as I can see it will fail before this stage.

Regardless, I am having problems just getting the formatting on the runcard to work as it doesn't like all the different blocks:

Failed to parse yaml file: while parsing a block mapping
  in "matched_test_notab.yaml", line 23, column 9
expected <block end>, but found '-'
  in "matched_test_notab.yaml", line 100, column 9
meta:
   author: Rosalyn Pearson
   keywords: [test, theory uncertainties, matched cuts]
   title: Testing theory covariance matrix with matched cuts
default_theory:
   - theoryid: 163

fivetheories: nobar

theoryids:
   - 163
   - 177
   - 176
   - 179
   - 174
#   - 180
#   - 173
#   - 175
#   - 178

thcovconfig:
   dataspecs:
      - theoryid: 163
        speclabel: $(\xi_F,\xi_R)=(1,1)$
        experiments:
        # Fixed target DIS
            - experiment: NMC
              datasets:
                 - dataset: NMCPD
                 - dataset: NMC
                 - experiment: SLAC
              datasets:
                 - dataset: SLACP
                 - dataset: SLACD
            - experiment: BCDMS
              datasets:
                 - dataset: BCDMSP
                 - dataset: BCDMSD
            - experiment: NTVDMN
              datasets:
                 - dataset: NTVNUDMN
                 - dataset: NTVNBDMN
            - experiment: CHORUS
              datasets:
                 - dataset: CHORUSNU
                 - dataset: CHORUSNB
          # Combined HERA charm production cross-sections
            - experiment: HERAF2CHARM
              datasets:
                 - dataset: HERAF2CHARM
          # HERA data
            - experiment: HERACOMB
              datasets:
                 - dataset: HERACOMBNCEM 
                 - dataset: HERACOMBNCEP460
                 - dataset: HERACOMBNCEP575
                 - dataset: HERACOMBNCEP820
                 - dataset: HERACOMBNCEP920
                 - dataset: HERACOMBCCEM 
                 - dataset: HERACOMBCCEP 
          # F2bottom data
            - experiment: F2BOTTOM
              datasets: 
                 - dataset: H1HERAF2B
                 - dataset: ZEUSHERAF2B
            - experiment: ATLAS
              datasets:
                 - dataset: ATLASWZRAP36PB
                 - dataset: ATLASZHIGHMASS49FB
                 - dataset: ATLASLOMASSDY11EXT
                 - dataset: ATLASWZRAP11
                 - dataset: ATLAS1JET11
                 - dataset: ATLASZPT8TEVMDIST
                 - dataset: ATLASZPT8TEVYDIST
                 - dataset: ATLASTTBARTOT
                 - dataset: ATLASTOPDIFF8TEVTRAPNORM
            - experiment: CMS
              datasets:
                 - dataset: CMSWEASY840PB
                 - dataset: CMSWMASY47FB
                 - dataset: CMSWCHARMRAT
                 - dataset: CMSDY2D11
                 - dataset: CMSWMU8TEV
                 - dataset: CMSJETS11
                 - dataset: CMSTTBARTOT
                 - dataset: CMSTOPDIFF8TEVTTRAPNORM
            - experiment: LHCb
              datasets:
                 - dataset: LHCBZ940PB
                 - dataset: LHCBZEE2FB
            - experiment: CDF
              datasets:
                 - dataset: CDFZRAP
                 - dataset: CDFR2KT
            - experiment: D0
              datasets:
                 - dataset: D0ZRAP
                 - dataset: D0WEASY
                 - dataset: D0WMASY
        - theoryid: 177
          speclabel: $(\xi_F,\xi_R)=(2,1)$
          experiments:
        # Fixed target DIS
            - experiment: NMC
              datasets:
                 - dataset: NMCPD
                 - dataset: NMC
                 - experiment: SLAC
              datasets:
                 - dataset: SLACP
                 - dataset: SLACD
            - experiment: BCDMS
              datasets:
                 - dataset: BCDMSP
                 - dataset: BCDMSD
            - experiment: NTVDMN
              datasets:
                 - dataset: NTVNUDMN
                 - dataset: NTVNBDMN
            - experiment: CHORUS
              datasets:
                 - dataset: CHORUSNU
                 - dataset: CHORUSNB
          # Combined HERA charm production cross-sections
            - experiment: HERAF2CHARM
              datasets:
                 - dataset: HERAF2CHARM
          # HERA data
            - experiment: HERACOMB
              datasets:
                 - dataset: HERACOMBNCEM 
                 - dataset: HERACOMBNCEP460
                 - dataset: HERACOMBNCEP575
                 - dataset: HERACOMBNCEP820
                 - dataset: HERACOMBNCEP920
                 - dataset: HERACOMBCCEM 
                 - dataset: HERACOMBCCEP 
          # F2bottom data
            - experiment: F2BOTTOM
              datasets: 
                 - dataset: H1HERAF2B
                 - dataset: ZEUSHERAF2B
            - experiment: ATLAS
              datasets:
                 - dataset: ATLASWZRAP36PB
                 - dataset: ATLASZHIGHMASS49FB
                 - dataset: ATLASLOMASSDY11EXT
                 - dataset: ATLASWZRAP11
                 - dataset: ATLAS1JET11
                 - dataset: ATLASZPT8TEVMDIST
                 - dataset: ATLASZPT8TEVYDIST
                 - dataset: ATLASTTBARTOT
                 - dataset: ATLASTOPDIFF8TEVTRAPNORM
            - experiment: CMS
              datasets:
                 - dataset: CMSWEASY840PB
                 - dataset: CMSWMASY47FB
                 - dataset: CMSWCHARMRAT
                 - dataset: CMSDY2D11
                 - dataset: CMSWMU8TEV
                 - dataset: CMSJETS11
                 - dataset: CMSTTBARTOT
                 - dataset: CMSTOPDIFF8TEVTTRAPNORM
            - experiment: LHCb
              datasets:
                 - dataset: LHCBZ940PB
                 - dataset: LHCBZEE2FB
            - experiment: CDF
              datasets:
                 - dataset: CDFZRAP
                 - dataset: CDFR2KT
            - experiment: D0
              datasets:
                 - dataset: D0ZRAP
                 - dataset: D0WEASY
                 - dataset: D0WMASY
        - theoryid: 176
          speclabel: $(\xi_F,\xi_R)=(0.5,1)$
          experiments:
        # Fixed target DIS
            - experiment: NMC
              datasets:
                 - dataset: NMCPD
                 - dataset: NMC
                 - experiment: SLAC
              datasets:
                 - dataset: SLACP
                 - dataset: SLACD
            - experiment: BCDMS
              datasets:
                 - dataset: BCDMSP
                 - dataset: BCDMSD
            - experiment: NTVDMN
              datasets:
                 - dataset: NTVNUDMN
                 - dataset: NTVNBDMN
            - experiment: CHORUS
              datasets:
                 - dataset: CHORUSNU
                 - dataset: CHORUSNB
          # Combined HERA charm production cross-sections
            - experiment: HERAF2CHARM
              datasets:
                 - dataset: HERAF2CHARM
          # HERA data
            - experiment: HERACOMB
              datasets:
                 - dataset: HERACOMBNCEM 
                 - dataset: HERACOMBNCEP460
                 - dataset: HERACOMBNCEP575
                 - dataset: HERACOMBNCEP820
                 - dataset: HERACOMBNCEP920
                 - dataset: HERACOMBCCEM 
                 - dataset: HERACOMBCCEP 
          # F2bottom data
            - experiment: F2BOTTOM
              datasets: 
                 - dataset: H1HERAF2B
                 - dataset: ZEUSHERAF2B
            - experiment: ATLAS
              datasets:
                 - dataset: ATLASWZRAP36PB
                 - dataset: ATLASZHIGHMASS49FB
                 - dataset: ATLASLOMASSDY11EXT
                 - dataset: ATLASWZRAP11
                 - dataset: ATLAS1JET11
                 - dataset: ATLASZPT8TEVMDIST
                 - dataset: ATLASZPT8TEVYDIST
                 - dataset: ATLASTTBARTOT
                 - dataset: ATLASTOPDIFF8TEVTRAPNORM
            - experiment: CMS
              datasets:
                 - dataset: CMSWEASY840PB
                 - dataset: CMSWMASY47FB
                 - dataset: CMSWCHARMRAT
                 - dataset: CMSDY2D11
                 - dataset: CMSWMU8TEV
                 - dataset: CMSJETS11
                 - dataset: CMSTTBARTOT
                 - dataset: CMSTOPDIFF8TEVTTRAPNORM
            - experiment: LHCb
              datasets:
                 - dataset: LHCBZ940PB
                 - dataset: LHCBZEE2FB
            - experiment: CDF
              datasets:
                 - dataset: CDFZRAP
                 - dataset: CDFR2KT
            - experiment: D0
              datasets:
                 - dataset: D0ZRAP
                 - dataset: D0WEASY
                 - dataset: D0WMASY
        - theoryid: 179
          speclabel: $(\xi_F,\xi_R)=(1,2)$ 
          experiments:
        # Fixed target DIS
            - experiment: NMC
              datasets:
                 - dataset: NMCPD
                 - dataset: NMC
                 - experiment: SLAC
              datasets:
                 - dataset: SLACP
                 - dataset: SLACD
            - experiment: BCDMS
              datasets:
                 - dataset: BCDMSP
                 - dataset: BCDMSD
            - experiment: NTVDMN
              datasets:
                 - dataset: NTVNUDMN
                 - dataset: NTVNBDMN
            - experiment: CHORUS
              datasets:
                 - dataset: CHORUSNU
                 - dataset: CHORUSNB
          # Combined HERA charm production cross-sections
            - experiment: HERAF2CHARM
              datasets:
                 - dataset: HERAF2CHARM
          # HERA data
            - experiment: HERACOMB
              datasets:
                 - dataset: HERACOMBNCEM 
                 - dataset: HERACOMBNCEP460
                 - dataset: HERACOMBNCEP575
                 - dataset: HERACOMBNCEP820
                 - dataset: HERACOMBNCEP920
                 - dataset: HERACOMBCCEM 
                 - dataset: HERACOMBCCEP 
          # F2bottom data
            - experiment: F2BOTTOM
              datasets: 
                 - dataset: H1HERAF2B
                 - dataset: ZEUSHERAF2B
            - experiment: ATLAS
              datasets:
                 - dataset: ATLASWZRAP36PB
                 - dataset: ATLASZHIGHMASS49FB
                 - dataset: ATLASLOMASSDY11EXT
                 - dataset: ATLASWZRAP11
                 - dataset: ATLAS1JET11
                 - dataset: ATLASZPT8TEVMDIST
                 - dataset: ATLASZPT8TEVYDIST
                 - dataset: ATLASTTBARTOT
                 - dataset: ATLASTOPDIFF8TEVTRAPNORM
            - experiment: CMS
              datasets:
                 - dataset: CMSWEASY840PB
                 - dataset: CMSWMASY47FB
                 - dataset: CMSWCHARMRAT
                 - dataset: CMSDY2D11
                 - dataset: CMSWMU8TEV
                 - dataset: CMSJETS11
                 - dataset: CMSTTBARTOT
                 - dataset: CMSTOPDIFF8TEVTTRAPNORM
            - experiment: LHCb
              datasets:
                 - dataset: LHCBZ940PB
                 - dataset: LHCBZEE2FB
            - experiment: CDF
              datasets:
                 - dataset: CDFZRAP
                 - dataset: CDFR2KT
            - experiment: D0
              datasets:
                 - dataset: D0ZRAP
                 - dataset: D0WEASY
                 - dataset: D0WMASY
        - theoryid: 174
          speclabel: $(\xi_F,\xi_R)=(1,0.5)$
          experiments:
        # Fixed target DIS
            - experiment: NMC
              datasets:
                 - dataset: NMCPD
                 - dataset: NMC
                 - experiment: SLAC
              datasets:
                 - dataset: SLACP
                 - dataset: SLACD
            - experiment: BCDMS
              datasets:
                 - dataset: BCDMSP
                 - dataset: BCDMSD
            - experiment: NTVDMN
              datasets:
                 - dataset: NTVNUDMN
                 - dataset: NTVNBDMN
            - experiment: CHORUS
              datasets:
                 - dataset: CHORUSNU
                 - dataset: CHORUSNB
          # Combined HERA charm production cross-sections
            - experiment: HERAF2CHARM
              datasets:
                 - dataset: HERAF2CHARM
          # HERA data
            - experiment: HERACOMB
              datasets:
                 - dataset: HERACOMBNCEM 
                 - dataset: HERACOMBNCEP460
                 - dataset: HERACOMBNCEP575
                 - dataset: HERACOMBNCEP820
                 - dataset: HERACOMBNCEP920
                 - dataset: HERACOMBCCEM 
                 - dataset: HERACOMBCCEP 
          # F2bottom data
            - experiment: F2BOTTOM
              datasets: 
                 - dataset: H1HERAF2B
                 - dataset: ZEUSHERAF2B
            - experiment: ATLAS
              datasets:
                 - dataset: ATLASWZRAP36PB
                 - dataset: ATLASZHIGHMASS49FB
                 - dataset: ATLASLOMASSDY11EXT
                 - dataset: ATLASWZRAP11
                 - dataset: ATLAS1JET11
                 - dataset: ATLASZPT8TEVMDIST
                 - dataset: ATLASZPT8TEVYDIST
                 - dataset: ATLASTTBARTOT
                 - dataset: ATLASTOPDIFF8TEVTRAPNORM
            - experiment: CMS
              datasets:
                 - dataset: CMSWEASY840PB
                 - dataset: CMSWMASY47FB
                 - dataset: CMSWCHARMRAT
                 - dataset: CMSDY2D11
                 - dataset: CMSWMU8TEV
                 - dataset: CMSJETS11
                 - dataset: CMSTTBARTOT
                 - dataset: CMSTOPDIFF8TEVTTRAPNORM
            - experiment: LHCb
              datasets:
                 - dataset: LHCBZ940PB
                 - dataset: LHCBZEE2FB
            - experiment: CDF
              datasets:
                 - dataset: CDFZRAP
                 - dataset: CDFR2KT
            - experiment: D0
              datasets:
                 - dataset: D0ZRAP
                 - dataset: D0WEASY
                 - dataset: D0WMASY
#        - theoryid: 180
#          speclabel: $(\xi_F,\xi_R)=(2,2)$ 
#        - theoryid: 173
#          speclabel: $(\xi_F,\xi_R)=(0.5,0.5)$
#        - theoryid: 175
#          speclabel: $(\xi_F,\xi_R)=(2,0.5)$   
#        - theoryid: 178
#          speclabel: $(\xi_F,\xi_R)=(0.5,2)$

shiftconfig:
   dataspecs:
      - theoryid: 52
        pdf: NNPDF31_nlo_as_0118_hessian
        speclabel: "NLO"
        fit: NNPDF31_nlo_as_0118_1000

      - theoryid: 53
        pdf: NNPDF31_nnlo_as_0118_hessian
        speclabel: "NNLO"
        fit: NNPDF31_nnlo_as_0118_1000

normalize_to: 1

use_cuts: 'fromfit'
fit: NNPDF31_nlo_as_0118_1000

pdf:
  from_: fit

template_text: |

   {@with default_theory@}

   {@plot_thcorrmat_heatmap_custom@}

   {@endwith@}

   {@with shiftconfig@}

   {@plot_matched_datasets_shift_matrix@}
   {@plot_matched_datasets_shift_matrix_correlations@}

   {@endwith@}

actions_:
  - report(main=true)