The public code for SIMUnet, a NNPDF based tool to perform simultaneous determination of PDFs and EFT Wilson coefficients.
GNU General Public License v3.0
3 stars 2 forks source link

Fixed-PDF fit failed #74

Open tomtong2015 opened 2 months ago

tomtong2015 commented 2 months ago


MacBook Air with M1 chip Memory 8GB macOS 12.6


Latest Python + Anaconda Created a conda environment, simunet, according to your tutorial All dependencies installed successfully, and the environment activated. SIMUnet has been downloaded, compiled under the environment, and installed successfully.


The runcard is based on your example Following is my full runcard:

# Runcard for SIMUnet
description: "Example runcard. This one performs a fixed-PDF EFT fit using data from different sectors."
# frac: training fraction of datapoints for the PDFs
# QCD: apply QCD K-factors
# EWK: apply electroweak K-factors
# simu_fac: fit BSM coefficients using their K-factors in the dataset
# use_fixed_predictions:  if set to True it removes the PDF dependence of the dataset
# sys: systematics treatment (see systypes)

# # DIS
- {dataset: HERACOMBNCEP460, frac: 0.75}
# # Drell - Yan
- {dataset: CMSDY1D12, cfac: ['QCD', 'EWK']}
# # ttbar
- {dataset: ATLASTTBARTOT7TEV, cfac: [QCD], simu_fac: "EFT_NLO"}
# # ttbar AC
- {dataset: ATLAS_TTBAR_8TEV_ASY, cfac: [QCD], simu_fac: "EFT_NLO"}
# # TTZ
- {dataset: ATLAS_TTBARZ_8TEV_TOTAL, simu_fac: "EFT_LO"}
# # TTW
- {dataset: ATLAS_TTBARW_8TEV_TOTAL, simu_fac: "EFT_LO"}
# # single top
- {dataset: ATLAS_SINGLETOP_TCH_7TEV_T, cfac: [QCD], simu_fac: "EFT_NLO"}
# # tW
- {dataset: ATLAS_SINGLETOPW_8TEV_TOTAL, simu_fac: "EFT_NLO"}
# # W helicity
- {dataset: ATLAS_WHEL_13TEV, simu_fac: "EFT_NLO", use_fixed_predictions: True}
# # tt gamma
- {dataset: ATLAS_TTBARGAMMA_8TEV_TOTAL, simu_fac: "EFT_LO", use_fixed_predictions: True}
# # tZ
- {dataset: ATLAS_SINGLETOPZ_13TEV_TOTAL, simu_fac: "EFT_LO", use_fixed_predictions: True}
# # EWPO
- {dataset: LEP_ZDATA, simu_fac: "EFT_LO", use_fixed_predictions: True}
#  Higgs
- {dataset: ATLAS_CMS_SSINC_RUNI, simu_fac: "EFT_NLO", use_fixed_predictions: True}
# Diboson
- {dataset: LEP_EEWW_182GEV, simu_fac: "EFT_LO", use_fixed_predictions: True}

# Uncomment to perform fixed-PDF fit
fixed_pdf_fit: True
load_weights_from_fit: 221103-jmm-no_top_1000_iterated

# Analytic initialisation features
analytic_initialisation_pdf: 221103-jmm-no_top_1000_iterated
analytic_check: False
automatic_scale_choice: False

# Dipoles
- {name: "OtG", scale: 0.1, initialisation: {type: uniform, minval: -10, maxval: 10} }
## Quark Currents
#- {name: "Opt", scale: 0.1, initialisation: {type: gaussian, mean: 0, std_dev: 1} }
## Lepton currents
#- {name: "O3pl", scale: 1.0, initialisation: {type: constant, value: 0} }
## linear combination
#- name: 'Y'
#  linear_combination:
#    'Olq1 ': 1.51606
#    'Oed ': -6.0606
#    'Oeu ': 12.1394
#    'Olu ': 6.0606
#    'Old ': -3.0394
#    'Oqe ': 3.0394
#  scale: 1.0
#  initialisation: {type: uniform , minval: -1, maxval: 1}

  t0pdfset: 221103-jmm-no_top_1000_iterated # PDF set to generate t0 covmat
  q2min: 3.49                        # Q2 minimum
  w2min: 12.5                        # W2 minimum

  theoryid: 270     # database id

trvlseed: 475038818
nnseed: 2394641471
mcseed: 1831662593
save: "weights.h5"
genrep: true      # true = generate MC replicas, false = use real data

parameters: # This defines the parameter dictionary that is passed to the Model Trainer
  nodes_per_layer: [25, 20, 8]
  activation_per_layer: [tanh, tanh, linear]
  initializer: glorot_normal
    clipnorm: 6.073e-6
    learning_rate: 2.621e-3
    optimizer_name: Nadam
  epochs: 30000
    initial: 184.8
    initial: 184.8
  stopping_patience: 0.2
  layer_type: dense
  dropout: 0.0
  threshold_chi2: 3.5

# EVOL(QED) = sng=0,g=1,v=2,v3=3,v8=4,t3=5,t8=6,(pht=7)
# EVOLS(QED)= sng=0,g=1,v=2,v8=4,t3=4,t8=5,ds=6,(pht=7)
# FLVR(QED) = g=0, u=1, ubar=2, d=3, dbar=4, s=5, sbar=6, (pht=7)
  fitbasis: EVOL  # EVOL (7), EVOLQED (8), etc.
  - {fl: sng, pos: false, trainable: false, mutsize: [15], mutprob: [0.05], smallx: [
      1.093, 1.121], largex: [1.486, 3.287]}
  - {fl: g, pos: false, trainable: false, mutsize: [15], mutprob: [0.05], smallx: [
      0.8329, 1.071], largex: [3.084, 6.767]}
  - {fl: v, pos: false, trainable: false, mutsize: [15], mutprob: [0.05], smallx: [
      0.5202, 0.7431], largex: [1.556, 3.639]}
  - {fl: v3, pos: false, trainable: false, mutsize: [15], mutprob: [0.05], smallx: [
      0.1205, 0.4839], largex: [1.736, 3.622]}
  - {fl: v8, pos: false, trainable: false, mutsize: [15], mutprob: [0.05], smallx: [
      0.5864, 0.7987], largex: [1.559, 3.569]}
  - {fl: t3, pos: false, trainable: false, mutsize: [15], mutprob: [0.05], smallx: [
      -0.5019, 1.126], largex: [1.754, 3.479]}
  - {fl: t8, pos: false, trainable: false, mutsize: [15], mutprob: [0.05], smallx: [
      0.6305, 0.8806], largex: [1.544, 3.481]}
  - {fl: t15, pos: false, trainable: false, mutsize: [15], mutprob: [0.05], smallx: [
      1.087, 1.139], largex: [1.48, 3.365]}

  - {dataset: POSF2U, maxlambda: 1e6}

  - {dataset: INTEGXT8, maxlambda: 1e2}

debug: false
maxcores: 4

Modifications in the runcard:

Since I'd like to try a fixed-PDF fit, the following flag has been uncommented:

# Uncomment to perform fixed-PDF fit
fixed_pdf_fit: True
load_weights_from_fit: 221103-jmm-no_top_1000_iterated

As a test run, all SMEFT operators have been commented out except for one, OtG, which has been turned on:

# Dipoles
- {name: "OtG", scale: 0.1, initialisation: {type: uniform, minval: -10, maxval: 10} }
## Quark Currents
#- {name: "Opt", scale: 0.1, initialisation: {type: gaussian, mean: 0, std_dev: 1} }
## Lepton currents
#- {name: "O3pl", scale: 1.0, initialisation: {type: constant, value: 0} }
## linear combination
#- name: 'Y'
#  linear_combination:
#    'Olq1 ': 1.51606
#    'Oed ': -6.0606
#    'Oeu ': 12.1394
#    'Olu ': 6.0606
#    'Old ': -3.0394
#    'Oqe ': 3.0394
#  scale: 1.0
#  initialisation: {type: uniform , minval: -1, maxval: 1}

The theory id has been set to 270. Thank you, Elie! πŸ˜ƒ

  theoryid: 270     # database id

Full output messages:

I was trying to make 1000 replicas.

Last login: Wed Sep  4 10:41:01 on ttys000
(base) tomtong@Toms-MacBook-Air SIMUnet_runs % conda activate simunet
(simunet) tomtong@Toms-MacBook-Air SIMUnet_runs % vp-setupfit tom_01.yml
[WARNING]: Using q2min from runcard
[WARNING]: Using w2min from runcard
Using Keras backend
[INFO]: All requirements processed and checked successfully. Executing actions.
[INFO]: 221103-jmm-no_top_1000_iterated T0 checked.
[INFO]: Verifying positivity tables:
[INFO]: POSF2U checked.
[INFO]: Filtering real data.
[INFO]: 204/209 datapoints in HERACOMBNCEP460 passed kinematic cuts.
[INFO]: 41/41 datapoints in CMSDY1D12 passed kinematic cuts.
[INFO]: 1/1 datapoints in ATLASTTBARTOT7TEV passed kinematic cuts.
[INFO]: 1/1 datapoints in ATLAS_TTBAR_8TEV_ASY passed kinematic cuts.
[INFO]: 1/1 datapoints in ATLAS_TTBARZ_8TEV_TOTAL passed kinematic cuts.
[INFO]: 1/1 datapoints in ATLAS_TTBARW_8TEV_TOTAL passed kinematic cuts.
[INFO]: 1/1 datapoints in ATLAS_SINGLETOP_TCH_7TEV_T passed kinematic cuts.
[INFO]: 1/1 datapoints in ATLAS_SINGLETOPW_8TEV_TOTAL passed kinematic cuts.
[INFO]: 2/2 datapoints in ATLAS_WHEL_13TEV passed kinematic cuts.
[INFO]: 1/1 datapoints in ATLAS_TTBARGAMMA_8TEV_TOTAL passed kinematic cuts.
[INFO]: 1/1 datapoints in ATLAS_SINGLETOPZ_13TEV_TOTAL passed kinematic cuts.
[INFO]: 19/19 datapoints in LEP_ZDATA passed kinematic cuts.
[INFO]: 22/22 datapoints in ATLAS_CMS_SSINC_RUNI passed kinematic cuts.
[INFO]: 10/10 datapoints in LEP_EEWW_182GEV passed kinematic cuts.
[INFO]: Summary: 306/311 datapoints passed kinematic cuts.
[INFO]: md5 255bf3898f015c32388471e62d03810f stored in /Users/tomtong/Desktop/SIMUnet/SIMUnet_runs/tom_01/md5
(simunet) tomtong@Toms-MacBook-Air SIMUnet_runs % n3fit tom_01.yml 1000
[INFO]: Creating replica output folder in /Users/tomtong/Desktop/SIMUnet/SIMUnet_runs/tom_01/nnfit/replica_1000
[WARNING]: Using q2min from runcard
[WARNING]: Using w2min from runcard
Using Keras backend
[INFO]: All requirements processed and checked successfully. Executing actions.
[INFO]: Loading positivity dataset POSF2U
[INFO]: Loading integrability dataset INTEGXT8
[INFO]: Clearing session
[INFO]: Setting the number of cores to: 4
[INFO]: Starting replica fit 1000

[INFO]: Clearing session
[INFO]: Generating layers
[INFO]: Using bsm_factor scales: [0.1]
[INFO]: Generating layers for experiment HERACOMB
[INFO]: Generating layers for experiment CMS
[INFO]: Generating layers for experiment ATLAS
[INFO]: Generating layers for experiment LEP
[INFO]: Generating layers for experiment ATLAS-CMS
[INFO]: Generating positivity penalty for POSF2U
[INFO]: Generating integrability penalty for INTEGXT8
[INFO]: Generating PDF models
[INFO]: Performing fixed PDF fit.
[INFO]: Generating the Model
[INFO]: Applying combination layer
[WARNING]: AutoGraph could not transform <bound method CombineCfacLayer.call of <n3fit.layers.CombineCfac.CombineCfacLayer object at 0x17e3bfd90>> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: Bad argument number for Compare: 2, expecting 3
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
[INFO]: Applying combination layer
Model: "meta_model"
 Layer (type)                   Output Shape         Param #     Connected to                     
 integration_grid (InputLayer)  [(1, 2000, 1)]       0           []                               

 input_2 (InputLayer)           [(1, 640, 1)]        0           []                               

 PDF_0 (MetaModel)              (1, None, 14)        779         ['integration_grid[0][0]',       

 tf.stack (TFOpLambda)          (1, 640, 14, 1)      0           ['PDF_0[0][0]']                  

 pdf_split (Lambda)             [(1, 50, 14, 1),     0           ['tf.stack[0][0]']               
                                 (1, 20, 14, 1),                                                  
                                 (1, 330, 14, 1),                                                 
                                 (1, 80, 14, 1),                                                  
                                 (1, 40, 14, 1),                                                  
                                 (1, 50, 14, 1),                                                  
                                 (1, 70, 14, 1)]                                                  

 ATLAS_split (Lambda)           [(1, 30, 14, 1),     0           ['pdf_split[0][2]']              
                                 (1, 40, 14, 1),                                                  
                                 (1, 40, 14, 1),                                                  
                                 (1, 40, 14, 1),                                                  
                                 (1, 20, 14, 1),                                                  
                                 (1, 40, 14, 1),                                                  
                                 (1, 40, 14, 1),                                                  
                                 (1, 40, 14, 1),                                                  
                                 (1, 40, 14, 1)]                                                  

 LEP_split (Lambda)             [(1, 40, 14, 1),     0           ['pdf_split[0][3]']              
                                 (1, 40, 14, 1)]                                                  

 dat_ATLASTTBARTOT7TEV (DY)     (1, 1, 1)            0           ['ATLAS_split[0][0]']            

 dat_ATLAS_TTBAR_8TEV_ASY (DY)  (1, 1, 1)            0           ['ATLAS_split[0][1]']            

 dat_ATLAS_TTBARZ_8TEV_TOTAL (D  (1, 1, 1)           0           ['ATLAS_split[0][2]']            

 dat_ATLAS_TTBARW_8TEV_TOTAL (D  (1, 1, 1)           0           ['ATLAS_split[0][3]']            

 dat_ATLAS_SINGLETOP_TCH_7TEV_T  (1, 1, 1)           0           ['ATLAS_split[0][4]']            

 dat_ATLAS_SINGLETOPW_8TEV_TOTA  (1, 1, 1)           0           ['ATLAS_split[0][5]']            
 L (DY)                                                                                           

 dat_ATLAS_WHEL_13TEV (Fixed)   (1, 1, 2)            0           ['ATLAS_split[0][6]']            

 dat_ATLAS_TTBARGAMMA_8TEV_TOTA  (1, 1, 1)           0           ['ATLAS_split[0][7]']            
 L (Fixed)                                                                                        

 dat_ATLAS_SINGLETOPZ_13TEV_TOT  (1, 1, 1)           0           ['ATLAS_split[0][8]']            
 AL (Fixed)                                                                                       

 dat_LEP_ZDATA (Fixed)          (1, 1, 19)           0           ['LEP_split[0][0]']              

 dat_LEP_EEWW_182GEV (Fixed)    (1, 1, 10)           0           ['LEP_split[0][1]']              

 dat_ATLAS_CMS_SSINC_RUNI (Fixe  (1, 1, 22)          0           ['pdf_split[0][4]']              

 dat_HERACOMBNCEP460 (DIS)      (1, 1, 153)          0           ['pdf_split[0][0]']              

 dat_CMSDY1D12 (DY)             (1, 1, 41)           0           ['pdf_split[0][1]']              

 combine_cfac_layer (CombineCfa  multiple            1           ['dat_ATLASTTBARTOT7TEV[0][0]',  
 cLayer)                                                          'dat_ATLAS_TTBAR_8TEV_ASY[0][0]'
                                                                 , 'dat_ATLAS_TTBARZ_8TEV_TOTAL[0]

 dat_POSF2U (DIS)               (1, 1, 20)           0           ['pdf_split[0][5]']              

 dat_INTEGXT8 (DIS)             (1, 1, 1)            0           ['pdf_split[0][6]']              

 tf.identity (TFOpLambda)       (1, 1, 153)          0           ['dat_HERACOMBNCEP460[0][0]']    

 tf.identity_1 (TFOpLambda)     (1, 1, 41)           0           ['dat_CMSDY1D12[0][0]']          

 tf.concat (TFOpLambda)         (1, 1, 10)           0           ['combine_cfac_layer[0][0]',     

 tf.concat_1 (TFOpLambda)       (1, 1, 29)           0           ['combine_cfac_layer[9][0]',     

 tf.identity_2 (TFOpLambda)     (1, 1, 22)           0           ['combine_cfac_layer[11][0]']    

 tf.identity_3 (TFOpLambda)     (1, 1, 20)           0           ['dat_POSF2U[0][0]']             

 tf.identity_4 (TFOpLambda)     (1, 1, 1)            0           ['dat_INTEGXT8[0][0]']           

 HERACOMB (LossInvcovmat)       (1,)                 23562       ['tf.identity[0][0]']            

 CMS (LossInvcovmat)            (1,)                 1722        ['tf.identity_1[0][0]']          

 ATLAS (LossInvcovmat)          (1,)                 110         ['tf.concat[0][0]']              

 LEP (LossInvcovmat)            (1,)                 870         ['tf.concat_1[0][0]']            

 ATLAS-CMS (LossInvcovmat)      (1,)                 506         ['tf.identity_2[0][0]']          

 POSF2U (LossPositivity)        (1,)                 1           ['tf.identity_3[0][0]']          

 INTEGXT8 (LossIntegrability)   (1,)                 1           ['tf.identity_4[0][0]']          

Total params: 27,552
Trainable params: 1
Non-trainable params: 27,551
[INFO]: Using weights from fit: 221103-jmm-no_top_1000_iterated
[INFO]: Loading weights from path: /opt/anaconda3/envs/simunet/share/NNPDF/results/221103-jmm-no_top_1000_iterated/nnfit/replica_1000/weights.h5
[WARNING]:  > NaN found, stopping activated
[INFO]: Stopped at epoch=1
1/1 [==============================] - 0s 285ms/step
1/1 [==============================] - 0s 400ms/step
1/1 [==============================] - 0s 17ms/step
1/1 [==============================] - 0s 15ms/step
1/1 [==============================] - 0s 16ms/step
[INFO]: Best fit for replica #1000, chi2=nan (tr=nan, vl=2.714)
[INFO]:  > Saving the weights for future in /Users/tomtong/Desktop/SIMUnet/SIMUnet_runs/tom_01/nnfit/replica_1000/weights.h5
(simunet) tomtong@Toms-MacBook-Air SIMUnet_runs % 

The process appears to have stopped too early, indicated by

[WARNING]:  > NaN found, stopping activated
[INFO]: Stopped at epoch=1


[INFO]: Best fit for replica #1000, chi2=nan (tr=nan, vl=2.714)

Thank you very much in advance!

LucaMantani commented 2 months ago

Hi, not sure exactly what the issue is (maybe @ElieHammou or @comane) can help more, but when you do: n3fit tom_01.yml 1000 you are not generating 1000 replicas, you are generating the replica # 1000.

To generate 1000 replicas you need to launch different jobs (better) or loop.

If you run with fixed_pdf_fit: False, it works?

ElieHammou commented 2 months ago

Hi Tom, I was discussing with @FrancescoMerlotti and he reminded me that there may be an issue with fixed-PDF fits with only one coefficient on mac systems. Could you try to turn another coefficient on for 1 replica to see if the eror goes away? Also if you have access to a Linux system, it could be useful to test the runcard there as I think this issue does not come up.

tomtong2015 commented 2 months ago

Hi, not sure exactly what the issue is (maybe @ElieHammou or @comane) can help more, but when you do: n3fit tom_01.yml 1000 you are not generating 1000 replicas, you are generating the replica # 1000.

To generate 1000 replicas you need to launch different jobs (better) or loop.

Hi Luca, thank you very much! I will launch different jobs for more replicas.

If you run with fixed_pdf_fit: False, it works?

Yes, it seems to work. I switched True to False while keeping everything else the same:

# Uncomment to perform fixed-PDF fit
fixed_pdf_fit: False
load_weights_from_fit: 221103-jmm-no_top_1000_iterated

Below is the last part of the output:

[INFO]: At epoch 9800/30000, total chi2: 1.5794931972728057
HERACOMB: 1.841, CMS: 0.978, ATLAS: 0.465, LEP: 1.974, ATLAS-CMS: 0.866, total: 1.579
Validation chi2 at this point: 2.702366352081299
[INFO]: ['8.47e-03']
[INFO]: At epoch 9900/30000, total chi2: 1.5831192203596527
HERACOMB: 1.848, CMS: 0.974, ATLAS: 0.470, LEP: 1.974, ATLAS-CMS: 0.866, total: 1.583
Validation chi2 at this point: 2.711261510848999
[INFO]: ['9.96e-03']
[INFO]: At epoch 10000/30000, total chi2: 1.580635497149299
HERACOMB: 1.844, CMS: 0.980, ATLAS: 0.452, LEP: 1.974, ATLAS-CMS: 0.865, total: 1.581
Validation chi2 at this point: 2.707155227661133
[INFO]: ['1.20e-02']
[INFO]: At epoch 10100/30000, total chi2: 1.573520331289254
HERACOMB: 1.841, CMS: 0.947, ATLAS: 0.454, LEP: 1.974, ATLAS-CMS: 0.865, total: 1.574
Validation chi2 at this point: 2.71159029006958
[INFO]: ['1.05e-02']
[INFO]: At epoch 10200/30000, total chi2: 1.5766352560005936
HERACOMB: 1.845, CMS: 0.944, ATLAS: 0.474, LEP: 1.974, ATLAS-CMS: 0.865, total: 1.577
Validation chi2 at this point: 2.7088916301727295
[INFO]: ['3.16e-03']
[INFO]: Stopped at epoch=10223
1/1 [==============================] - 0s 291ms/step
1/1 [==============================] - 0s 309ms/step
1/1 [==============================] - 0s 16ms/step
1/1 [==============================] - 0s 16ms/step
1/1 [==============================] - 0s 17ms/step
[INFO]: Best fit for replica #123, chi2=1.021 (tr=1.635, vl=2.492)
[INFO]:  > Saving the weights for future in /Users/tomtong/Desktop/SIMUnet/SIMUnet_runs/tom_02/nnfit/replica_123/weights.h5
(simunet) tomtong@Toms-Air SIMUnet_runs % 

Is it running properly?

tomtong2015 commented 2 months ago

Hi Tom, I was discussing with @FrancescoMerlotti and he reminded me that there may be an issue with fixed-PDF fits with only one coefficient on mac systems. Could you try to turn another coefficient on for 1 replica to see if the eror goes away? Also if you have access to a Linux system, it could be useful to test the runcard there as I think this issue does not come up.

Hi Elie, many thanks to both of you! I truly appreciate the help!

As Luca pointed out, fixed-PDF seems to be the issue. Of course, I'm in no position to make such conclusions. You guys are the experts πŸ˜‰

I also tried a fixed-PDF fit with 3 Wilson coefficients turned on. It seems that the problem remains. Below is the last part of the output:

Total params: 27,554
Trainable params: 3
Non-trainable params: 27,551
[INFO]: Using weights from fit: 221103-jmm-no_top_1000_iterated
[INFO]: Loading weights from path: /opt/anaconda3/envs/simunet/share/NNPDF/results/221103-jmm-no_top_1000_iterated/nnfit/replica_456/weights.h5
[WARNING]:  > NaN found, stopping activated
[INFO]: Stopped at epoch=1
1/1 [==============================] - 0s 287ms/step
1/1 [==============================] - 0s 300ms/step
1/1 [==============================] - 0s 15ms/step
1/1 [==============================] - 0s 15ms/step
1/1 [==============================] - 0s 16ms/step
[INFO]: Best fit for replica #456, chi2=nan (tr=nan, vl=1.928)
[INFO]:  > Saving the weights for future in /Users/tomtong/Desktop/SIMUnet/SIMUnet_runs/tom_03/nnfit/replica_456/weights.h5
(simunet) tomtong@Toms-Air SIMUnet_runs % 

Well, it also could be an issue with the infamous Apple silicon and the translation. I can try it on a Linux system after I get our admin's approval.

FrancescoMerlotti commented 2 months ago

Hi Tom, I was discussing with @FrancescoMerlotti and he reminded me that there may be an issue with fixed-PDF fits with only one coefficient on mac systems. Could you try to turn another coefficient on for 1 replica to see if the eror goes away? Also if you have access to a Linux system, it could be useful to test the runcard there as I think this issue does not come up.

Hi Elie, many thanks to both of you! I truly appreciate the help!

As Luca pointed out, fixed-PDF seems to be the issue. Of course, I'm in no position to make such conclusions. You guys are the experts πŸ˜‰

I also tried a fixed-PDF fit with 3 Wilson coefficients turned on. It seems that the problem remains. Below is the last part of the output:

Total params: 27,554
Trainable params: 3
Non-trainable params: 27,551
[INFO]: Using weights from fit: 221103-jmm-no_top_1000_iterated
[INFO]: Loading weights from path: /opt/anaconda3/envs/simunet/share/NNPDF/results/221103-jmm-no_top_1000_iterated/nnfit/replica_456/weights.h5
[WARNING]:  > NaN found, stopping activated
[INFO]: Stopped at epoch=1
1/1 [==============================] - 0s 287ms/step
1/1 [==============================] - 0s 300ms/step
1/1 [==============================] - 0s 15ms/step
1/1 [==============================] - 0s 15ms/step
1/1 [==============================] - 0s 16ms/step
[INFO]: Best fit for replica #456, chi2=nan (tr=nan, vl=1.928)
[INFO]:  > Saving the weights for future in /Users/tomtong/Desktop/SIMUnet/SIMUnet_runs/tom_03/nnfit/replica_456/weights.h5
(simunet) tomtong@Toms-Air SIMUnet_runs % 

Well, it also could be an issue with the infamous Apple silicon and the translation. I can try it on a Linux system after I get our admin's approval.

Hi Tom, I think it is an issue with Apple Silicon indeed, it might be related to some version of Tensorflow for Mac. I have to find it! As Elie said, a Linux machine should work just fine, and the runcard should be right as well.

tomtong2015 commented 2 months ago

Hi Tom, I was discussing with @FrancescoMerlotti and he reminded me that there may be an issue with fixed-PDF fits with only one coefficient on mac systems. Could you try to turn another coefficient on for 1 replica to see if the eror goes away? Also if you have access to a Linux system, it could be useful to test the runcard there as I think this issue does not come up.

Hi Elie, many thanks to both of you! I truly appreciate the help! As Luca pointed out, fixed-PDF seems to be the issue. Of course, I'm in no position to make such conclusions. You guys are the experts πŸ˜‰ I also tried a fixed-PDF fit with 3 Wilson coefficients turned on. It seems that the problem remains. Below is the last part of the output:

Total params: 27,554
Trainable params: 3
Non-trainable params: 27,551
[INFO]: Using weights from fit: 221103-jmm-no_top_1000_iterated
[INFO]: Loading weights from path: /opt/anaconda3/envs/simunet/share/NNPDF/results/221103-jmm-no_top_1000_iterated/nnfit/replica_456/weights.h5
[WARNING]:  > NaN found, stopping activated
[INFO]: Stopped at epoch=1
1/1 [==============================] - 0s 287ms/step
1/1 [==============================] - 0s 300ms/step
1/1 [==============================] - 0s 15ms/step
1/1 [==============================] - 0s 15ms/step
1/1 [==============================] - 0s 16ms/step
[INFO]: Best fit for replica #456, chi2=nan (tr=nan, vl=1.928)
[INFO]:  > Saving the weights for future in /Users/tomtong/Desktop/SIMUnet/SIMUnet_runs/tom_03/nnfit/replica_456/weights.h5
(simunet) tomtong@Toms-Air SIMUnet_runs % 

Well, it also could be an issue with the infamous Apple silicon and the translation. I can try it on a Linux system after I get our admin's approval.

Hi Tom, I think it is an issue with Apple Silicon indeed, it might be related to some version of Tensorflow for Mac. I have to find it! As Elie said, a Linux machine should work just fine, and the runcard should be right as well.

Hi Francesco, thank you very much! I'll try a Linux machine as soon as possible, and come back to you for further guidance πŸ‘