HEP-PBSP / SIMUnet

The public code for SIMUnet, a NNPDF based tool to perform simultaneous determination of PDFs and EFT Wilson coefficients.
https://hep-pbsp.github.io/SIMUnet/
GNU General Public License v3.0
2 stars 2 forks source link

Pineparser for compatibility with new theories #63

Open comane opened 5 months ago

comane commented 5 months ago

The scope of this PR is to allow SIMUnet to use new theories generated with pineappl.

Example on how to use this for the moment:

  1. from theory_700/fast_kernel copy NMC_NC_NOTFIXED_P_EM-SIGMARED.pineappl.lz4 into theory_270/fast_kernel

  2. from nnpdf_data/new_commondata/NMC_NC_NOTFIXED_P copy the metadata.yaml as NMC_NC_NOTFIXED_P_EM-SIGMARED_metadata.yaml into theory_700/fast_kernel

  3. copy DATA_NMC.dat into DATA_NMC_NC_NOTFIXED_P_EM-SIGMARED.dat within data/commondata

Now, it should be possible to run a fit with the following dataset_inputs:

dataset_inputs:
- {dataset: NMC_NC_NOTFIXED_P_EM-SIGMARED, new_commondata: true}  

TODO

ElieHammou commented 3 months ago

Hi @comane , Thanks for starting the PR. I am testing it at the moment with this data:

dataset_inputs:
- {dataset: NMCPD_dw_ite, frac: 0.75} # Old FK table
- {dataset: EIC_NC_EPD_88_PES, frac: 0.75} # New FK table

I am working with theory 270. I have manually added this FK table:

simunet-dev/share/NNPDF/data/theory_270/fastkernel/EIC_NC_EPD_88_PES.pineappl.lz4 

and this compound file (using the old way):

simunet-dev/share/NNPDF/data/theory_270/compound/FK_EIC_NC_EPD_88_PES-COMPOUND.dat

Here is the content of the compound file:

# COMPOUND FK
FK: EIC_NC_EPD_88_PES
OP: NULL

When I vp-setupfit it seems to be looking for the wrong name:

(simunet-dev) ~/Projects/Low_E_PDF/low-energy/Fits/ - (main) > vp-setupfit test_simunet_EIC.yaml
[WARNING]: Output folder exists: /Users/eliehammou/Projects/Low_E_PDF/low-energy/Fits/test_simunet_EIC Overwriting contents
[WARNING]: Using q2min from runcard
[WARNING]: Using w2min from runcard
[ERROR]: Bad configuration encountered:
Incorrect COMPOUND file '/Users/eliehammou/miniconda3/envs/simunet-dev/share/NNPDF/data/theory_270/compound/FK_EIC_NC_EPD_88_PES-COMPOUND.dat'. Searching for non-existing FKTable:
Could not find FKTable for set '_NC_EPD_88'. File '/Users/eliehammou/miniconda3/envs/simunet-dev/share/NNPDF/data/theory_270/fastkernel/FK__NC_EPD_88.dat' not found

It looks like it is messing up with both the prefix and the suffix. It is due to the fact that the old format had the following naming convention for FK tables:

FK_EIC_NC_EPD_88_PES.dat
ElieHammou commented 3 months ago

For the record, is this PR relying on the old compound files to link commondata and FK tables or is it expecting the info to be stored in the yamldb folder of the theory, like nnpdf does currently?

comane commented 3 months ago
dataset_inputs:
- {dataset: NMCPD_dw_ite, frac: 0.75} # Old FK table
- {dataset: EIC_NC_EPD_88_PES, frac: 0.75} # New FK table

Can you try adding the new_commondata: true flag to the dataset that makes use of the FKtable in the pineappl format.

For the record, is this PR relying on the old compound files to link commondata and FK tables or is it expecting the info to be stored in the yamldb folder of the theory, like nnpdf does currently?

I don't think that this PR supports compounds yet

ElieHammou commented 3 months ago

Sure thing.

I have just tried vp-setupfit with:

dataset_inputs:
- {dataset: NMCPD_dw_ite, frac: 0.75} # Old FK table
- {dataset: EIC_NC_EPD_88_PES, frac: 0.75, new_commondata: true} # New FK table

I have also removed the compound file I had initially added. It gives me the following error:

(simunet-dev) ~/Projects/Low_E_PDF/low-energy/Fits/ - (main) > vp-setupfit test_simunet_EIC.yaml
[WARNING]: Output folder exists: /Users/eliehammou/Projects/Low_E_PDF/low-energy/Fits/test_simunet_EIC Overwriting contents
[WARNING]: Using q2min from runcard
[WARNING]: Using w2min from runcard
[CRITICAL]: Bug in setup-fit ocurred. Please report it.
Traceback (most recent call last):
  File "/Users/eliehammou/Software/simunet_git/SIMUnet/validphys2/src/validphys/loader.py", line 405, in check_compound
    with compound_spec_path.open() as f:
  File "/Users/eliehammou/miniconda3/envs/simunet-dev/lib/python3.10/pathlib.py", line 1119, in open
    return self._accessor.open(self, mode, buffering, encoding, errors,
FileNotFoundError: [Errno 2] No such file or directory: '/Users/eliehammou/miniconda3/envs/simunet-dev/share/NNPDF/data/theory_270/compound/FK_EIC_NC_EPD_88_PES-COMPOUND.dat'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/eliehammou/Software/simunet_git/SIMUnet/validphys2/src/validphys/loader.py", line 590, in check_dataset
    fkspec, op = self.check_compound(theoryno, name, cfac)
  File "/Users/eliehammou/Software/simunet_git/SIMUnet/validphys2/src/validphys/loader.py", line 412, in check_compound
    raise CompoundNotFound(msg)
validphys.loader.CompoundNotFound: Could not find COMPOUND set 'EIC_NC_EPD_88_PES' for theory 270: [Errno 2] No such file or directory: '/Users/eliehammou/miniconda3/envs/simunet-dev/share/NNPDF/data/theory_270/compound/FK_EIC_NC_EPD_88_PES-COMPOUND.dat'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/eliehammou/Software/simunet_git/SIMUnet/n3fit/src/n3fit/scripts/vp_setupfit.py", line 197, in run
    super().run()
  File "/Users/eliehammou/Software/simunet_git/SIMUnet/validphys2/src/validphys/app.py", line 158, in run
    super().run()
  File "/Users/eliehammou/miniconda3/envs/simunet-dev/lib/python3.10/site-packages/reportengine/app.py", line 358, in run
    rb.resolve_fuzzytargets()
  File "/Users/eliehammou/miniconda3/envs/simunet-dev/lib/python3.10/site-packages/reportengine/resourcebuilder.py", line 370, in resolve_fuzzytargets
    self.resolve_fuzzytarget(target)
  File "/Users/eliehammou/miniconda3/envs/simunet-dev/lib/python3.10/site-packages/reportengine/resourcebuilder.py", line 379, in resolve_fuzzytarget
    self.process_targetspec(fuzzytarget.name, spec, fuzzytarget.extraargs)
  File "/Users/eliehammou/miniconda3/envs/simunet-dev/lib/python3.10/site-packages/reportengine/resourcebuilder.py", line 388, in process_targetspec
    gen.send(None)
  File "/Users/eliehammou/miniconda3/envs/simunet-dev/lib/python3.10/site-packages/reportengine/resourcebuilder.py", line 450, in _process_requirement
    yield from self._make_node(name, nsspec, extraargs, parents)
  File "/Users/eliehammou/miniconda3/envs/simunet-dev/lib/python3.10/site-packages/reportengine/resourcebuilder.py", line 466, in _make_node
    yield from self._make_callspec(f, name, nsspec, extraargs, parents)
  File "/Users/eliehammou/miniconda3/envs/simunet-dev/lib/python3.10/site-packages/reportengine/resourcebuilder.py", line 499, in _make_callspec
    index, _ = gen.send(None)
  File "/Users/eliehammou/miniconda3/envs/simunet-dev/lib/python3.10/site-packages/reportengine/resourcebuilder.py", line 417, in _process_requirement
    put_index, val = self.input_parser.resolve_key(name, ns, parents=parents, currspec=nsspec)
  File "/Users/eliehammou/miniconda3/envs/simunet-dev/lib/python3.10/site-packages/reportengine/configparser.py", line 429, in resolve_key
    return self._resolve_key(key=key, ns=ns, input_params=input_params,
  File "/Users/eliehammou/miniconda3/envs/simunet-dev/lib/python3.10/site-packages/reportengine/configparser.py", line 491, in _resolve_key
    val = produce_func(**kwargs)
  File "/Users/eliehammou/Software/simunet_git/SIMUnet/validphys2/src/validphys/config.py", line 1492, in produce_data
    datasets.append(self.parse_from_(None, "dataset", write=False)[1])
  File "/Users/eliehammou/miniconda3/envs/simunet-dev/lib/python3.10/site-packages/reportengine/configparser.py", line 133, in f_
    return f(self, val, *args, **kwargs)
  File "/Users/eliehammou/miniconda3/envs/simunet-dev/lib/python3.10/site-packages/reportengine/configparser.py", line 735, in parse_from_
    return self.resolve_key(element, ns, input_params=input_params,
  File "/Users/eliehammou/miniconda3/envs/simunet-dev/lib/python3.10/site-packages/reportengine/configparser.py", line 429, in resolve_key
    return self._resolve_key(key=key, ns=ns, input_params=input_params,
  File "/Users/eliehammou/miniconda3/envs/simunet-dev/lib/python3.10/site-packages/reportengine/configparser.py", line 491, in _resolve_key
    val = produce_func(**kwargs)
  File "/Users/eliehammou/Software/simunet_git/SIMUnet/validphys2/src/validphys/config.py", line 754, in produce_dataset
    ds = self.loader.check_dataset(
  File "/Users/eliehammou/Software/simunet_git/SIMUnet/validphys2/src/validphys/loader.py", line 592, in check_dataset
    fkspec = self.check_fktable(theoryno, name, cfac, use_fixed_predictions=use_fixed_predictions, new_commondata=new_commondata)
  File "/Users/eliehammou/Software/simunet_git/SIMUnet/validphys2/src/validphys/loader.py", line 386, in check_fktable
    with open(path_metadata, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/Users/eliehammou/miniconda3/envs/simunet-dev/share/NNPDF/data/theory_270/fastkernel/EIC_NC_EPD_88_PES_metadata.yaml'

It appears to complain about the absence of the compound file and the metadata file. The metadata file makes sense since I am using an old commondata implementation with a new FK table.

I will implement the metadata or try with a dataset which has it already implemented and come back to you. I am confused about the compound error though.

ElieHammou commented 3 months ago

Hi @comane , I think I have found a bug, it appears that the new FK tables cannot be read if another dataset if being contaminated. For example, the following runcard works well:

dataset_inputs:
- {dataset: NMCPD_dw_ite, frac: 0.75} # Old FK table
- {dataset: EIC_CC_EMP_140_OPT, frac: 0.75, new_commondata: true} # New FK table

But if I add another dataset to be contaminated, the vp-setupfit steps bugs out:

dataset_inputs:
- {dataset: NMCPD_dw_ite, frac: 0.75} # Old FK table
- {dataset: HLLHC_HMDY_NC_EL_FINAL, frac: 0.75, cfac: ['QCD', 'EWK'], contamination: 'EFT_LO'}
- {dataset: EIC_CC_EMP_140_OPT, frac: 0.75, new_commondata: true} # New FK table

I have then the following error:

(simunet-dev) ~/Projects/Low_E_PDF/low-energy/Fits/ - (main) > vp-setupfit test_simunet_EIC.yaml
[WARNING]: Output folder exists: /Users/eliehammou/Projects/Low_E_PDF/low-energy/Fits/test_simunet_EIC Overwriting contents
[WARNING]: Using q2min from runcard
[WARNING]: Using w2min from runcard
Using Keras backend
[INFO]: All requirements processed and checked successfully. Executing actions.
[WARNING]: Importing libNNPDF
[INFO]: Initialising RNG
- Random Generator allocated: ranlux
[INFO]: NNPDF40_nnlo_as_01180 T0 checked.
[INFO]: Verifying positivity tables:
[INFO]: POSF2U checked.
[INFO]: POSF2DW checked.
[INFO]: POSF2S checked.
[INFO]: POSFLL checked.
[INFO]: POSDYU checked.
[INFO]: POSDYD checked.
[INFO]: POSDYS checked.
[INFO]: POSF2C checked.
[INFO]: POSXUQ checked.
[INFO]: POSXUB checked.
[INFO]: POSXDQ checked.
[INFO]: POSXDB checked.
[INFO]: POSXSQ checked.
[INFO]: POSXSB checked.
[INFO]: POSXGL checked.
-- Generating closure data for DEUTERON
-- Generating replica data for DEUTERON
[WARNING]: Dataset output folder exists: /Users/eliehammou/Projects/Low_E_PDF/low-energy/Fits/test_simunet_EIC/filter/NMCPD_dw_ite Overwriting contents
[INFO]: 121/260 datapoints in NMCPD_dw_ite passed kinematic cuts.
-- Generating closure data for HLLHC
-- Generating replica data for HLLHC
[INFO]: 12/12 datapoints in HLLHC_HMDY_NC_EL_FINAL passed kinematic cuts.
[CRITICAL]: Bug in setup-fit ocurred. Please report it.
Traceback (most recent call last):
  File "/Users/eliehammou/Software/simunet_git/SIMUnet/n3fit/src/n3fit/scripts/vp_setupfit.py", line 197, in run
    super().run()
  File "/Users/eliehammou/Software/simunet_git/SIMUnet/validphys2/src/validphys/app.py", line 158, in run
    super().run()
  File "/Users/eliehammou/miniconda3/envs/simunet-dev/lib/python3.10/site-packages/reportengine/app.py", line 380, in run
    rb.execute_sequential()
  File "/Users/eliehammou/miniconda3/envs/simunet-dev/lib/python3.10/site-packages/reportengine/resourcebuilder.py", line 166, in execute_sequential
    result = self.get_result(callspec.function,
  File "/Users/eliehammou/miniconda3/envs/simunet-dev/lib/python3.10/site-packages/reportengine/resourcebuilder.py", line 175, in get_result
    fres =  function(**kwdict)
  File "/Users/eliehammou/Software/simunet_git/SIMUnet/validphys2/src/validphys/filters.py", line 122, in filter_closure_data_by_experiment
    return [
  File "/Users/eliehammou/Software/simunet_git/SIMUnet/validphys2/src/validphys/filters.py", line 123, in <listcomp>
    _filter_closure_data(filter_path, exp, t0pdfset, fakenoise, errorsize)
  File "/Users/eliehammou/Software/simunet_git/SIMUnet/validphys2/src/validphys/filters.py", line 177, in _filter_closure_data
    loaded_data = data.load.__wrapped__(data)
  File "/Users/eliehammou/Software/simunet_git/SIMUnet/validphys2/src/validphys/core.py", line 774, in load
    loaded_data = dataset.load()
  File "/Users/eliehammou/Software/simunet_git/SIMUnet/validphys2/src/validphys/core.py", line 584, in load
    fktable = p.load()
  File "/Users/eliehammou/Software/simunet_git/SIMUnet/validphys2/src/validphys/core.py", line 702, in load
    return FKTable(str(self.fkpath), [str(factor) for factor in self.cfactors])
  File "/Users/eliehammou/miniconda3/envs/simunet-dev/lib/python3.10/site-packages/NNPDF/nnpdf.py", line 3042, in __init__
    _nnpdf.FKTable_swiginit(self, _nnpdf.new_FKTable(*args))
RuntimeError: [utils] error: Could not open (PosixPath('/Users/eliehammou/miniconda3/envs/simunet-dev/share/NNPDF/data/theory_270/fastkernel/EIC_CC_EMP_140_OPT.pineappl.lz4'),)

I have similar problems with validphys runcards.

ElieHammou commented 3 months ago

I have no idea what the problem can be to be honest

ElieHammou commented 3 months ago

I think I understand the issue. The contamination itself is not the issue, the new FK tables do not work in a closure test.

This runcard produces a bug for instance:

dataset_inputs:
- {dataset: NMCPD_dw_ite, frac: 0.75} # Old FK table
- {dataset: EIC_CC_EMP_140_OPT, frac: 0.75, new_commondata: true} # New FK table

###########################################################
# The closure test namespace tells us the settings for the
# (possible contaminated) closure test.
############################################################
closuretest:
    filterseed: 0 # Random seed to be used in filtering data partitions
    fakedata: true     # true = to use FAKEPDF to generate pseudo-data
    fakepdf: NNPDF40_nnlo_as_01180      # Theory input for pseudo-data
    errorsize: 1.0    # uncertainties rescaling
    fakenoise: true    # true = to add random fluctuations to pseudo-data
    rancutprob: 1.0   # Fraction of data to be included in the fit
    rancutmethod: 0   # Method to select rancutprob data fraction
    rancuttrnval: false # 0(1) to output training(valiation) chi2 in report
    printpdf4gen: false # To print info on PDFs during minimization
#     contamination_parameters:
#       - name: 'W'
#         value: 0.00008
#         linear_combination:
#             'Olq3': -15.94

seed: 0
rngalgo: 0

The bug disappears if I comment out the closure test key.

comane commented 3 months ago

I think I understand the issue. The contamination itself is not the issue, the new FK tables do not work in a closure test.

Yes, exactly. As I had already commented in the description above, this PR still not supports the filtering of closure test data when using the new pine parser. It's in the TODO list above. Thanks for pointing this out again