HEP-PBSP / SIMUnet

The public code for SIMUnet, a NNPDF based tool to perform simultaneous determination of PDFs and EFT Wilson coefficients.
https://hep-pbsp.github.io/SIMUnet/
GNU General Public License v3.0
3 stars 2 forks source link

Fixed-PDF fit failed #74

Open tomtong2015 opened 2 months ago

tomtong2015 commented 2 months ago

System:

MacBook Air with M1 chip Memory 8GB macOS 12.6

Environment:

Latest Python + Anaconda Created a conda environment, simunet, according to your tutorial All dependencies installed successfully, and the environment activated. SIMUnet has been downloaded, compiled under the environment, and installed successfully.

Runcard:

The runcard is based on your example Following is my full runcard:

# Runcard for SIMUnet
#
############################################################
description: "Example runcard. This one performs a fixed-PDF EFT fit using data from different sectors."
############################################################
# frac: training fraction of datapoints for the PDFs
# QCD: apply QCD K-factors
# EWK: apply electroweak K-factors
# simu_fac: fit BSM coefficients using their K-factors in the dataset
# use_fixed_predictions:  if set to True it removes the PDF dependence of the dataset
# sys: systematics treatment (see systypes)

dataset_inputs:
# # DIS
- {dataset: HERACOMBNCEP460, frac: 0.75}
# # Drell - Yan
- {dataset: CMSDY1D12, cfac: ['QCD', 'EWK']}
# # ttbar
- {dataset: ATLASTTBARTOT7TEV, cfac: [QCD], simu_fac: "EFT_NLO"}
# # ttbar AC
- {dataset: ATLAS_TTBAR_8TEV_ASY, cfac: [QCD], simu_fac: "EFT_NLO"}
# # TTZ
- {dataset: ATLAS_TTBARZ_8TEV_TOTAL, simu_fac: "EFT_LO"}
# # TTW
- {dataset: ATLAS_TTBARW_8TEV_TOTAL, simu_fac: "EFT_LO"}
# # single top
- {dataset: ATLAS_SINGLETOP_TCH_7TEV_T, cfac: [QCD], simu_fac: "EFT_NLO"}
# # tW
- {dataset: ATLAS_SINGLETOPW_8TEV_TOTAL, simu_fac: "EFT_NLO"}
# # W helicity
- {dataset: ATLAS_WHEL_13TEV, simu_fac: "EFT_NLO", use_fixed_predictions: True}
# # tt gamma
- {dataset: ATLAS_TTBARGAMMA_8TEV_TOTAL, simu_fac: "EFT_LO", use_fixed_predictions: True}
# # tZ
- {dataset: ATLAS_SINGLETOPZ_13TEV_TOTAL, simu_fac: "EFT_LO", use_fixed_predictions: True}
# # EWPO
- {dataset: LEP_ZDATA, simu_fac: "EFT_LO", use_fixed_predictions: True}
#  Higgs
- {dataset: ATLAS_CMS_SSINC_RUNI, simu_fac: "EFT_NLO", use_fixed_predictions: True}
# Diboson
- {dataset: LEP_EEWW_182GEV, simu_fac: "EFT_LO", use_fixed_predictions: True}

############################################################
# Uncomment to perform fixed-PDF fit
fixed_pdf_fit: True
load_weights_from_fit: 221103-jmm-no_top_1000_iterated

############################################################
# Analytic initialisation features
analytic_initialisation_pdf: 221103-jmm-no_top_1000_iterated
analytic_check: False
automatic_scale_choice: False

############################################################
simu_parameters:
# Dipoles
- {name: "OtG", scale: 0.1, initialisation: {type: uniform, minval: -10, maxval: 10} }
## Quark Currents
#- {name: "Opt", scale: 0.1, initialisation: {type: gaussian, mean: 0, std_dev: 1} }
## Lepton currents
#- {name: "O3pl", scale: 1.0, initialisation: {type: constant, value: 0} }
## linear combination
#- name: 'Y'
#  linear_combination:
#    'Olq1 ': 1.51606
#    'Oed ': -6.0606
#    'Oeu ': 12.1394
#    'Olu ': 6.0606
#    'Old ': -3.0394
#    'Oqe ': 3.0394
#  scale: 1.0
#  initialisation: {type: uniform , minval: -1, maxval: 1}

############################################################
datacuts:
  t0pdfset: 221103-jmm-no_top_1000_iterated # PDF set to generate t0 covmat
  q2min: 3.49                        # Q2 minimum
  w2min: 12.5                        # W2 minimum

############################################################
theory:
  theoryid: 270     # database id

############################################################
trvlseed: 475038818
nnseed: 2394641471
mcseed: 1831662593
save: "weights.h5"
genrep: true      # true = generate MC replicas, false = use real data

############################################################
parameters: # This defines the parameter dictionary that is passed to the Model Trainer
  nodes_per_layer: [25, 20, 8]
  activation_per_layer: [tanh, tanh, linear]
  initializer: glorot_normal
  optimizer:
    clipnorm: 6.073e-6
    learning_rate: 2.621e-3
    optimizer_name: Nadam
  epochs: 30000
  positivity:
    initial: 184.8
    multiplier:
  integrability:
    initial: 184.8
    multiplier:
  stopping_patience: 0.2
  layer_type: dense
  dropout: 0.0
  threshold_chi2: 3.5

fitting:
# EVOL(QED) = sng=0,g=1,v=2,v3=3,v8=4,t3=5,t8=6,(pht=7)
# EVOLS(QED)= sng=0,g=1,v=2,v8=4,t3=4,t8=5,ds=6,(pht=7)
# FLVR(QED) = g=0, u=1, ubar=2, d=3, dbar=4, s=5, sbar=6, (pht=7)
  fitbasis: EVOL  # EVOL (7), EVOLQED (8), etc.
  basis:
  - {fl: sng, pos: false, trainable: false, mutsize: [15], mutprob: [0.05], smallx: [
      1.093, 1.121], largex: [1.486, 3.287]}
  - {fl: g, pos: false, trainable: false, mutsize: [15], mutprob: [0.05], smallx: [
      0.8329, 1.071], largex: [3.084, 6.767]}
  - {fl: v, pos: false, trainable: false, mutsize: [15], mutprob: [0.05], smallx: [
      0.5202, 0.7431], largex: [1.556, 3.639]}
  - {fl: v3, pos: false, trainable: false, mutsize: [15], mutprob: [0.05], smallx: [
      0.1205, 0.4839], largex: [1.736, 3.622]}
  - {fl: v8, pos: false, trainable: false, mutsize: [15], mutprob: [0.05], smallx: [
      0.5864, 0.7987], largex: [1.559, 3.569]}
  - {fl: t3, pos: false, trainable: false, mutsize: [15], mutprob: [0.05], smallx: [
      -0.5019, 1.126], largex: [1.754, 3.479]}
  - {fl: t8, pos: false, trainable: false, mutsize: [15], mutprob: [0.05], smallx: [
      0.6305, 0.8806], largex: [1.544, 3.481]}
  - {fl: t15, pos: false, trainable: false, mutsize: [15], mutprob: [0.05], smallx: [
      1.087, 1.139], largex: [1.48, 3.365]}

############################################################
positivity:
  posdatasets:
  - {dataset: POSF2U, maxlambda: 1e6}

############################################################
integrability:
  integdatasets:
  - {dataset: INTEGXT8, maxlambda: 1e2}

############################################################
debug: false
maxcores: 4

Modifications in the runcard:

Since I'd like to try a fixed-PDF fit, the following flag has been uncommented:

############################################################
# Uncomment to perform fixed-PDF fit
fixed_pdf_fit: True
load_weights_from_fit: 221103-jmm-no_top_1000_iterated

As a test run, all SMEFT operators have been commented out except for one, OtG, which has been turned on:

############################################################
simu_parameters:
# Dipoles
- {name: "OtG", scale: 0.1, initialisation: {type: uniform, minval: -10, maxval: 10} }
## Quark Currents
#- {name: "Opt", scale: 0.1, initialisation: {type: gaussian, mean: 0, std_dev: 1} }
## Lepton currents
#- {name: "O3pl", scale: 1.0, initialisation: {type: constant, value: 0} }
## linear combination
#- name: 'Y'
#  linear_combination:
#    'Olq1 ': 1.51606
#    'Oed ': -6.0606
#    'Oeu ': 12.1394
#    'Olu ': 6.0606
#    'Old ': -3.0394
#    'Oqe ': 3.0394
#  scale: 1.0
#  initialisation: {type: uniform , minval: -1, maxval: 1}

The theory id has been set to 270. Thank you, Elie! πŸ˜ƒ

############################################################
theory:
  theoryid: 270     # database id

Full output messages:

I was trying to make 1000 replicas.

Last login: Wed Sep  4 10:41:01 on ttys000
(base) tomtong@Toms-MacBook-Air SIMUnet_runs % conda activate simunet
(simunet) tomtong@Toms-MacBook-Air SIMUnet_runs % vp-setupfit tom_01.yml
[WARNING]: Using q2min from runcard
[WARNING]: Using w2min from runcard
Using Keras backend
[INFO]: All requirements processed and checked successfully. Executing actions.
[INFO]: 221103-jmm-no_top_1000_iterated T0 checked.
[INFO]: Verifying positivity tables:
[INFO]: POSF2U checked.
[INFO]: Filtering real data.
[INFO]: 204/209 datapoints in HERACOMBNCEP460 passed kinematic cuts.
[INFO]: 41/41 datapoints in CMSDY1D12 passed kinematic cuts.
[INFO]: 1/1 datapoints in ATLASTTBARTOT7TEV passed kinematic cuts.
[INFO]: 1/1 datapoints in ATLAS_TTBAR_8TEV_ASY passed kinematic cuts.
[INFO]: 1/1 datapoints in ATLAS_TTBARZ_8TEV_TOTAL passed kinematic cuts.
[INFO]: 1/1 datapoints in ATLAS_TTBARW_8TEV_TOTAL passed kinematic cuts.
[INFO]: 1/1 datapoints in ATLAS_SINGLETOP_TCH_7TEV_T passed kinematic cuts.
[INFO]: 1/1 datapoints in ATLAS_SINGLETOPW_8TEV_TOTAL passed kinematic cuts.
[INFO]: 2/2 datapoints in ATLAS_WHEL_13TEV passed kinematic cuts.
[INFO]: 1/1 datapoints in ATLAS_TTBARGAMMA_8TEV_TOTAL passed kinematic cuts.
[INFO]: 1/1 datapoints in ATLAS_SINGLETOPZ_13TEV_TOTAL passed kinematic cuts.
[INFO]: 19/19 datapoints in LEP_ZDATA passed kinematic cuts.
[INFO]: 22/22 datapoints in ATLAS_CMS_SSINC_RUNI passed kinematic cuts.
[INFO]: 10/10 datapoints in LEP_EEWW_182GEV passed kinematic cuts.
[INFO]: Summary: 306/311 datapoints passed kinematic cuts.
[INFO]: md5 255bf3898f015c32388471e62d03810f stored in /Users/tomtong/Desktop/SIMUnet/SIMUnet_runs/tom_01/md5
(simunet) tomtong@Toms-MacBook-Air SIMUnet_runs % n3fit tom_01.yml 1000
[INFO]: Creating replica output folder in /Users/tomtong/Desktop/SIMUnet/SIMUnet_runs/tom_01/nnfit/replica_1000
[WARNING]: Using q2min from runcard
[WARNING]: Using w2min from runcard
Using Keras backend
[INFO]: All requirements processed and checked successfully. Executing actions.
[INFO]: Loading positivity dataset POSF2U
[INFO]: Loading integrability dataset INTEGXT8
[INFO]: Clearing session
[INFO]: Setting the number of cores to: 4
[INFO]: Starting replica fit 1000

[INFO]: Clearing session
[INFO]: Generating layers
[INFO]: Using bsm_factor scales: [0.1]
[INFO]: Generating layers for experiment HERACOMB
[INFO]: Generating layers for experiment CMS
[INFO]: Generating layers for experiment ATLAS
[INFO]: Generating layers for experiment LEP
[INFO]: Generating layers for experiment ATLAS-CMS
[INFO]: Generating positivity penalty for POSF2U
[INFO]: Generating integrability penalty for INTEGXT8
[INFO]: Generating PDF models
[INFO]: Performing fixed PDF fit.
[INFO]: Generating the Model
[INFO]: Applying combination layer
[WARNING]: AutoGraph could not transform <bound method CombineCfacLayer.call of <n3fit.layers.CombineCfac.CombineCfacLayer object at 0x17e3bfd90>> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: Bad argument number for Compare: 2, expecting 3
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
Model: "meta_model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 integration_grid (InputLayer)  [(1, 2000, 1)]       0           []                               

 input_2 (InputLayer)           [(1, 640, 1)]        0           []                               

 PDF_0 (MetaModel)              (1, None, 14)        779         ['integration_grid[0][0]',       
                                                                  'input_2[0][0]']                

 tf.stack (TFOpLambda)          (1, 640, 14, 1)      0           ['PDF_0[0][0]']                  

 pdf_split (Lambda)             [(1, 50, 14, 1),     0           ['tf.stack[0][0]']               
                                 (1, 20, 14, 1),                                                  
                                 (1, 330, 14, 1),                                                 
                                 (1, 80, 14, 1),                                                  
                                 (1, 40, 14, 1),                                                  
                                 (1, 50, 14, 1),                                                  
                                 (1, 70, 14, 1)]                                                  

 ATLAS_split (Lambda)           [(1, 30, 14, 1),     0           ['pdf_split[0][2]']              
                                 (1, 40, 14, 1),                                                  
                                 (1, 40, 14, 1),                                                  
                                 (1, 40, 14, 1),                                                  
                                 (1, 20, 14, 1),                                                  
                                 (1, 40, 14, 1),                                                  
                                 (1, 40, 14, 1),                                                  
                                 (1, 40, 14, 1),                                                  
                                 (1, 40, 14, 1)]                                                  

 LEP_split (Lambda)             [(1, 40, 14, 1),     0           ['pdf_split[0][3]']              
                                 (1, 40, 14, 1)]                                                  

 dat_ATLASTTBARTOT7TEV (DY)     (1, 1, 1)            0           ['ATLAS_split[0][0]']            

 dat_ATLAS_TTBAR_8TEV_ASY (DY)  (1, 1, 1)            0           ['ATLAS_split[0][1]']            

 dat_ATLAS_TTBARZ_8TEV_TOTAL (D  (1, 1, 1)           0           ['ATLAS_split[0][2]']            
 Y)                                                                                               

 dat_ATLAS_TTBARW_8TEV_TOTAL (D  (1, 1, 1)           0           ['ATLAS_split[0][3]']            
 Y)                                                                                               

 dat_ATLAS_SINGLETOP_TCH_7TEV_T  (1, 1, 1)           0           ['ATLAS_split[0][4]']            
  (DY)                                                                                            

 dat_ATLAS_SINGLETOPW_8TEV_TOTA  (1, 1, 1)           0           ['ATLAS_split[0][5]']            
 L (DY)                                                                                           

 dat_ATLAS_WHEL_13TEV (Fixed)   (1, 1, 2)            0           ['ATLAS_split[0][6]']            

 dat_ATLAS_TTBARGAMMA_8TEV_TOTA  (1, 1, 1)           0           ['ATLAS_split[0][7]']            
 L (Fixed)                                                                                        

 dat_ATLAS_SINGLETOPZ_13TEV_TOT  (1, 1, 1)           0           ['ATLAS_split[0][8]']            
 AL (Fixed)                                                                                       

 dat_LEP_ZDATA (Fixed)          (1, 1, 19)           0           ['LEP_split[0][0]']              

 dat_LEP_EEWW_182GEV (Fixed)    (1, 1, 10)           0           ['LEP_split[0][1]']              

 dat_ATLAS_CMS_SSINC_RUNI (Fixe  (1, 1, 22)          0           ['pdf_split[0][4]']              
 d)                                                                                               

 dat_HERACOMBNCEP460 (DIS)      (1, 1, 153)          0           ['pdf_split[0][0]']              

 dat_CMSDY1D12 (DY)             (1, 1, 41)           0           ['pdf_split[0][1]']              

 combine_cfac_layer (CombineCfa  multiple            1           ['dat_ATLASTTBARTOT7TEV[0][0]',  
 cLayer)                                                          'dat_ATLAS_TTBAR_8TEV_ASY[0][0]'
                                                                 , 'dat_ATLAS_TTBARZ_8TEV_TOTAL[0]
                                                                 [0]',                            
                                                                  'dat_ATLAS_TTBARW_8TEV_TOTAL[0][
                                                                 0]',                             
                                                                  'dat_ATLAS_SINGLETOP_TCH_7TEV_T[
                                                                 0][0]',                          
                                                                  'dat_ATLAS_SINGLETOPW_8TEV_TOTAL
                                                                 [0][0]',                         
                                                                  'dat_ATLAS_WHEL_13TEV[0][0]',   
                                                                  'dat_ATLAS_TTBARGAMMA_8TEV_TOTAL
                                                                 [0][0]',                         
                                                                  'dat_ATLAS_SINGLETOPZ_13TEV_TOTA
                                                                 L[0][0]',                        
                                                                  'dat_LEP_ZDATA[0][0]',          
                                                                  'dat_LEP_EEWW_182GEV[0][0]',    
                                                                  'dat_ATLAS_CMS_SSINC_RUNI[0][0]'
                                                                 ]                                

 dat_POSF2U (DIS)               (1, 1, 20)           0           ['pdf_split[0][5]']              

 dat_INTEGXT8 (DIS)             (1, 1, 1)            0           ['pdf_split[0][6]']              

 tf.identity (TFOpLambda)       (1, 1, 153)          0           ['dat_HERACOMBNCEP460[0][0]']    

 tf.identity_1 (TFOpLambda)     (1, 1, 41)           0           ['dat_CMSDY1D12[0][0]']          

 tf.concat (TFOpLambda)         (1, 1, 10)           0           ['combine_cfac_layer[0][0]',     
                                                                  'combine_cfac_layer[1][0]',     
                                                                  'combine_cfac_layer[2][0]',     
                                                                  'combine_cfac_layer[3][0]',     
                                                                  'combine_cfac_layer[4][0]',     
                                                                  'combine_cfac_layer[5][0]',     
                                                                  'combine_cfac_layer[6][0]',     
                                                                  'combine_cfac_layer[7][0]',     
                                                                  'combine_cfac_layer[8][0]']     

 tf.concat_1 (TFOpLambda)       (1, 1, 29)           0           ['combine_cfac_layer[9][0]',     
                                                                  'combine_cfac_layer[10][0]']    

 tf.identity_2 (TFOpLambda)     (1, 1, 22)           0           ['combine_cfac_layer[11][0]']    

 tf.identity_3 (TFOpLambda)     (1, 1, 20)           0           ['dat_POSF2U[0][0]']             

 tf.identity_4 (TFOpLambda)     (1, 1, 1)            0           ['dat_INTEGXT8[0][0]']           

 HERACOMB (LossInvcovmat)       (1,)                 23562       ['tf.identity[0][0]']            

 CMS (LossInvcovmat)            (1,)                 1722        ['tf.identity_1[0][0]']          

 ATLAS (LossInvcovmat)          (1,)                 110         ['tf.concat[0][0]']              

 LEP (LossInvcovmat)            (1,)                 870         ['tf.concat_1[0][0]']            

 ATLAS-CMS (LossInvcovmat)      (1,)                 506         ['tf.identity_2[0][0]']          

 POSF2U (LossPositivity)        (1,)                 1           ['tf.identity_3[0][0]']          

 INTEGXT8 (LossIntegrability)   (1,)                 1           ['tf.identity_4[0][0]']          

==================================================================================================
Total params: 27,552
Trainable params: 1
Non-trainable params: 27,551
__________________________________________________________________________________________________
[INFO]: Using weights from fit: 221103-jmm-no_top_1000_iterated
[INFO]: Loading weights from path: /opt/anaconda3/envs/simunet/share/NNPDF/results/221103-jmm-no_top_1000_iterated/nnfit/replica_1000/weights.h5
[WARNING]:  > NaN found, stopping activated
[INFO]: Stopped at epoch=1
1/1 [==============================] - 0s 285ms/step
1/1 [==============================] - 0s 400ms/step
1/1 [==============================] - 0s 17ms/step
1/1 [==============================] - 0s 15ms/step
1/1 [==============================] - 0s 16ms/step
[INFO]: Best fit for replica #1000, chi2=nan (tr=nan, vl=2.714)
[INFO]:  > Saving the weights for future in /Users/tomtong/Desktop/SIMUnet/SIMUnet_runs/tom_01/nnfit/replica_1000/weights.h5
(simunet) tomtong@Toms-MacBook-Air SIMUnet_runs % 

The process appears to have stopped too early, indicated by

[WARNING]:  > NaN found, stopping activated
[INFO]: Stopped at epoch=1

and

[INFO]: Best fit for replica #1000, chi2=nan (tr=nan, vl=2.714)

Thank you very much in advance!

LucaMantani commented 2 months ago

Hi, not sure exactly what the issue is (maybe @ElieHammou or @comane) can help more, but when you do: n3fit tom_01.yml 1000 you are not generating 1000 replicas, you are generating the replica # 1000.

To generate 1000 replicas you need to launch different jobs (better) or loop.

If you run with fixed_pdf_fit: False, it works?

ElieHammou commented 2 months ago

Hi Tom, I was discussing with @FrancescoMerlotti and he reminded me that there may be an issue with fixed-PDF fits with only one coefficient on mac systems. Could you try to turn another coefficient on for 1 replica to see if the eror goes away? Also if you have access to a Linux system, it could be useful to test the runcard there as I think this issue does not come up.

tomtong2015 commented 2 months ago

Hi, not sure exactly what the issue is (maybe @ElieHammou or @comane) can help more, but when you do: n3fit tom_01.yml 1000 you are not generating 1000 replicas, you are generating the replica # 1000.

To generate 1000 replicas you need to launch different jobs (better) or loop.

Hi Luca, thank you very much! I will launch different jobs for more replicas.

If you run with fixed_pdf_fit: False, it works?

Yes, it seems to work. I switched True to False while keeping everything else the same:

############################################################
# Uncomment to perform fixed-PDF fit
fixed_pdf_fit: False
load_weights_from_fit: 221103-jmm-no_top_1000_iterated

Below is the last part of the output:

[INFO]: At epoch 9800/30000, total chi2: 1.5794931972728057
HERACOMB: 1.841, CMS: 0.978, ATLAS: 0.465, LEP: 1.974, ATLAS-CMS: 0.866, total: 1.579
Validation chi2 at this point: 2.702366352081299
[INFO]: ['8.47e-03']
[INFO]: At epoch 9900/30000, total chi2: 1.5831192203596527
HERACOMB: 1.848, CMS: 0.974, ATLAS: 0.470, LEP: 1.974, ATLAS-CMS: 0.866, total: 1.583
Validation chi2 at this point: 2.711261510848999
[INFO]: ['9.96e-03']
[INFO]: At epoch 10000/30000, total chi2: 1.580635497149299
HERACOMB: 1.844, CMS: 0.980, ATLAS: 0.452, LEP: 1.974, ATLAS-CMS: 0.865, total: 1.581
Validation chi2 at this point: 2.707155227661133
[INFO]: ['1.20e-02']
[INFO]: At epoch 10100/30000, total chi2: 1.573520331289254
HERACOMB: 1.841, CMS: 0.947, ATLAS: 0.454, LEP: 1.974, ATLAS-CMS: 0.865, total: 1.574
Validation chi2 at this point: 2.71159029006958
[INFO]: ['1.05e-02']
[INFO]: At epoch 10200/30000, total chi2: 1.5766352560005936
HERACOMB: 1.845, CMS: 0.944, ATLAS: 0.474, LEP: 1.974, ATLAS-CMS: 0.865, total: 1.577
Validation chi2 at this point: 2.7088916301727295
[INFO]: ['3.16e-03']
[INFO]: Stopped at epoch=10223
1/1 [==============================] - 0s 291ms/step
1/1 [==============================] - 0s 309ms/step
1/1 [==============================] - 0s 16ms/step
1/1 [==============================] - 0s 16ms/step
1/1 [==============================] - 0s 17ms/step
[INFO]: Best fit for replica #123, chi2=1.021 (tr=1.635, vl=2.492)
[INFO]:  > Saving the weights for future in /Users/tomtong/Desktop/SIMUnet/SIMUnet_runs/tom_02/nnfit/replica_123/weights.h5
(simunet) tomtong@Toms-Air SIMUnet_runs % 

Is it running properly?

tomtong2015 commented 2 months ago

Hi Tom, I was discussing with @FrancescoMerlotti and he reminded me that there may be an issue with fixed-PDF fits with only one coefficient on mac systems. Could you try to turn another coefficient on for 1 replica to see if the eror goes away? Also if you have access to a Linux system, it could be useful to test the runcard there as I think this issue does not come up.

Hi Elie, many thanks to both of you! I truly appreciate the help!

As Luca pointed out, fixed-PDF seems to be the issue. Of course, I'm in no position to make such conclusions. You guys are the experts πŸ˜‰

I also tried a fixed-PDF fit with 3 Wilson coefficients turned on. It seems that the problem remains. Below is the last part of the output:

==================================================================================================
Total params: 27,554
Trainable params: 3
Non-trainable params: 27,551
__________________________________________________________________________________________________
[INFO]: Using weights from fit: 221103-jmm-no_top_1000_iterated
[INFO]: Loading weights from path: /opt/anaconda3/envs/simunet/share/NNPDF/results/221103-jmm-no_top_1000_iterated/nnfit/replica_456/weights.h5
[WARNING]:  > NaN found, stopping activated
[INFO]: Stopped at epoch=1
1/1 [==============================] - 0s 287ms/step
1/1 [==============================] - 0s 300ms/step
1/1 [==============================] - 0s 15ms/step
1/1 [==============================] - 0s 15ms/step
1/1 [==============================] - 0s 16ms/step
[INFO]: Best fit for replica #456, chi2=nan (tr=nan, vl=1.928)
[INFO]:  > Saving the weights for future in /Users/tomtong/Desktop/SIMUnet/SIMUnet_runs/tom_03/nnfit/replica_456/weights.h5
(simunet) tomtong@Toms-Air SIMUnet_runs % 

Well, it also could be an issue with the infamous Apple silicon and the translation. I can try it on a Linux system after I get our admin's approval.

FrancescoMerlotti commented 2 months ago

Hi Tom, I was discussing with @FrancescoMerlotti and he reminded me that there may be an issue with fixed-PDF fits with only one coefficient on mac systems. Could you try to turn another coefficient on for 1 replica to see if the eror goes away? Also if you have access to a Linux system, it could be useful to test the runcard there as I think this issue does not come up.

Hi Elie, many thanks to both of you! I truly appreciate the help!

As Luca pointed out, fixed-PDF seems to be the issue. Of course, I'm in no position to make such conclusions. You guys are the experts πŸ˜‰

I also tried a fixed-PDF fit with 3 Wilson coefficients turned on. It seems that the problem remains. Below is the last part of the output:

==================================================================================================
Total params: 27,554
Trainable params: 3
Non-trainable params: 27,551
__________________________________________________________________________________________________
[INFO]: Using weights from fit: 221103-jmm-no_top_1000_iterated
[INFO]: Loading weights from path: /opt/anaconda3/envs/simunet/share/NNPDF/results/221103-jmm-no_top_1000_iterated/nnfit/replica_456/weights.h5
[WARNING]:  > NaN found, stopping activated
[INFO]: Stopped at epoch=1
1/1 [==============================] - 0s 287ms/step
1/1 [==============================] - 0s 300ms/step
1/1 [==============================] - 0s 15ms/step
1/1 [==============================] - 0s 15ms/step
1/1 [==============================] - 0s 16ms/step
[INFO]: Best fit for replica #456, chi2=nan (tr=nan, vl=1.928)
[INFO]:  > Saving the weights for future in /Users/tomtong/Desktop/SIMUnet/SIMUnet_runs/tom_03/nnfit/replica_456/weights.h5
(simunet) tomtong@Toms-Air SIMUnet_runs % 

Well, it also could be an issue with the infamous Apple silicon and the translation. I can try it on a Linux system after I get our admin's approval.

Hi Tom, I think it is an issue with Apple Silicon indeed, it might be related to some version of Tensorflow for Mac. I have to find it! As Elie said, a Linux machine should work just fine, and the runcard should be right as well.

tomtong2015 commented 2 months ago

Hi Tom, I was discussing with @FrancescoMerlotti and he reminded me that there may be an issue with fixed-PDF fits with only one coefficient on mac systems. Could you try to turn another coefficient on for 1 replica to see if the eror goes away? Also if you have access to a Linux system, it could be useful to test the runcard there as I think this issue does not come up.

Hi Elie, many thanks to both of you! I truly appreciate the help! As Luca pointed out, fixed-PDF seems to be the issue. Of course, I'm in no position to make such conclusions. You guys are the experts πŸ˜‰ I also tried a fixed-PDF fit with 3 Wilson coefficients turned on. It seems that the problem remains. Below is the last part of the output:

==================================================================================================
Total params: 27,554
Trainable params: 3
Non-trainable params: 27,551
__________________________________________________________________________________________________
[INFO]: Using weights from fit: 221103-jmm-no_top_1000_iterated
[INFO]: Loading weights from path: /opt/anaconda3/envs/simunet/share/NNPDF/results/221103-jmm-no_top_1000_iterated/nnfit/replica_456/weights.h5
[WARNING]:  > NaN found, stopping activated
[INFO]: Stopped at epoch=1
1/1 [==============================] - 0s 287ms/step
1/1 [==============================] - 0s 300ms/step
1/1 [==============================] - 0s 15ms/step
1/1 [==============================] - 0s 15ms/step
1/1 [==============================] - 0s 16ms/step
[INFO]: Best fit for replica #456, chi2=nan (tr=nan, vl=1.928)
[INFO]:  > Saving the weights for future in /Users/tomtong/Desktop/SIMUnet/SIMUnet_runs/tom_03/nnfit/replica_456/weights.h5
(simunet) tomtong@Toms-Air SIMUnet_runs % 

Well, it also could be an issue with the infamous Apple silicon and the translation. I can try it on a Linux system after I get our admin's approval.

Hi Tom, I think it is an issue with Apple Silicon indeed, it might be related to some version of Tensorflow for Mac. I have to find it! As Elie said, a Linux machine should work just fine, and the runcard should be right as well.

Hi Francesco, thank you very much! I'll try a Linux machine as soon as possible, and come back to you for further guidance πŸ‘