NNPDF / nnpdf

An open-source machine learning framework for global analyses of parton distributions.
https://docs.nnpdf.science/
GNU General Public License v3.0
28 stars 6 forks source link

APFEL error at the end of the fit #151

Closed wilsonmr closed 6 years ago

wilsonmr commented 6 years ago

Hello,

Having an issue running fits on the cluster at Edinburgh, the fits appear to be finishing however they are not outputting the LHAPDF grids, so we don't get the results.

@nhartland suggests it is a problem concerning APFEL.

I have pasted below the final bit of output from the fit, I have the full outputs available for a few different configurations of fits:

 **** Producing T0 Predictions with Set 170206-003

- Final Positivity Test
- Passed all points for POSF2U
- Passed all points for POSF2DW
- Passed all points for POSF2S
- Passed all points for POSFLL
- Passed all points for POSDYU
- Passed all points for POSDYD
- Passed all points for POSDYS

- Writing fitinfo file...
- Computing arclengths...
- Writing sumrules file...
- Writing preproc file...
- Writing params file...
- Writing out LHAPDF grid: 180321-edi-002
- Solving DGLAP for LHAPDF grid...
Thanks for using LHAPDF 6.2.1. Please make sure to cite the paper:
  Eur.Phys.J. C75 (2015) 3, 132  (http://arxiv.org/abs/1412.7420)

 Checking APFEL v3.0.2  ...

 WARNING: FONLL-C is a VFN scheme
          ... setting VFNS PDF evolution
 WARNING: FONLL-C is a NNLO scheme
          ... setting NNLO perturbative order

 Welcome to 
      _/_/_/    _/_/_/_/   _/_/_/_/   _/_/_/_/   _/
    _/    _/   _/    _/   _/         _/         _/
   _/_/_/_/   _/_/_/_/   _/_/_/     _/_/_/     _/
  _/    _/   _/         _/         _/         _/
 _/    _/   _/         _/         _/_/_/_/   _/_/_/_/
 _____v3.0.2 A PDF Evolution Library, arXiv:1310.1394
      Authors: V. Bertone, S. Carrazza, J. Rojo

 Report of the evolution parameters:

 QCD evolution
 Space-like evolution (PDFs)
 Unpolarized evolution
 Evolution scheme: VFNS at N2LO
 Solution of the DGLAP equation: 'exactalpha' with maximum 6 active flavours
 Solution of the coupling equations: 'exact' with maximum 6 active flavours
 Coupling reference value:
 - AlphaQCD(  1.4142 GeV) =  0.350000
 Pole heavy quark masses:
 - Mc =   1.4142 GeV
 - Mb =   4.5000 GeV
 - Mt = 175.0000 GeV
 The matching thresholds coincide with the physical masses
 muR / muF =  1.0000

 Allowed evolution range [   1.0000 :  10000.0000 ] GeV
 The internal subgrids will be locked
 Fast evolution enabled

 Initialization of the evolution completed in   4.256 s

 Report of the electroweak parameters:

 Mass of the Z = 91.188 GeV
 Mass of the W = 80.385 GeV
 Mass of the proton = 0.9383 GeV
 sin^2(thetaW) = 0.2313
 GFermi = 1.16638E-05
       | 0.9743 0.2254 0.0036 |
 CKM = | 0.2252 0.9734 0.0414 |
       | 0.0089 0.0405 0.9991 |
 Z propagator correction = 0.00000

 Report of the DIS parameters:

 Computation in the FONLL-C mass scheme
 Electromagnetic (EM) process
 Scattering electron - proton   
 muR / Q =  1.0000
 muF / Q =  1.0000
 Target Mass corrections disabled
 FONLL damping factor for charm enabled with suppression power = 2
 FONLL damping factor for bottom enabled with suppression power = 2
 FONLL damping factor for top enabled with suppression power = 2
 Intrinsic charm disabled

 Initialization of the DIS module completed in  45.390 s

 Check ... succeded

 WARNING: if there are external grids they cannot be locked
          ... unlocking subgrids

 Welcome to 
      _/_/_/    _/_/_/_/   _/_/_/_/   _/_/_/_/   _/
    _/    _/   _/    _/   _/         _/         _/
   _/_/_/_/   _/_/_/_/   _/_/_/     _/_/_/     _/
  _/    _/   _/         _/         _/         _/
 _/    _/   _/         _/         _/_/_/_/   _/_/_/_/
 _____v3.0.2 A PDF Evolution Library, arXiv:1310.1394
      Authors: V. Bertone, S. Carrazza, J. Rojo

 Report of the evolution parameters:

 QCD evolution
 Space-like evolution (PDFs)
 Unpolarized evolution
 Evolution scheme: VFNS at N2LO
 Solution of the DGLAP equation: 'truncated' with maximum 5 active flavours
 - value of the truncation parameter epsilon = 1.000E-02
 Solution of the coupling equations: 'expanded' with maximum 5 active flavours
 Coupling reference value:
 - AlphaQCD( 91.2000 GeV) =  0.118000
 Pole heavy quark masses:
 - Mc =   1.5100 GeV
 - Mb =   4.9200 GeV
 - Mt = 172.5000 GeV
 The matching thresholds coincide with the physical masses
 muR / muF =  1.0000

 Allowed evolution range [   1.6500 : 100000.0000 ] GeV

 Initialization of the evolution completed in 917.283 s

 In odeintns.f:
 stepsize underflow in rkqsns
wilsonmr commented 6 years ago

another example of the message (both from odeintns.f) is

Allowed evolution range [   1.6500 : 100000.0000 ] GeV

 Initialization of the evolution completed in 923.826 s

 In odeintns.f:
 too many steps!
Zaharid commented 6 years ago

How is this installed? conda?

wilsonmr commented 6 years ago

Yes, same error both with conda packages and also a dev environment as a cross check

Zaharid commented 6 years ago

I am going to bet this has something to do with the stack getting corrupted as in https://github.com/NNPDF/apfelcomb/issues/9.

tgiani commented 6 years ago

Same kind of errors for me from a different account on the cluster

scarrazza commented 6 years ago

@wilsonmr @tgiani which theoryID are you running?

tgiani commented 6 years ago

we are using theory 53

scarrazza commented 6 years ago

Thanks, I have just tested and on my machines I don't see this issue at all...

wilsonmr commented 6 years ago

Hi Stefano, Yes @nhartland also couldn't seem to reproduce this error, it seems to be specific to the cluster at Edinburgh.

We started getting this issue only recently after struggling to get conda to install properly on the cluster. which I eventually worked around by installing a slightly older version of conda (see #132)

Zaharid commented 6 years ago

Note that conda does't do anything but compile the thing and package it. This is very unlikely to depend on the version of the conda installer. Additionally stack related errors can innocently write on padded areas and not trigger any signal in some code paths (and we in fact have evidence that this seems to work most of the time). This is where things like ASAN or valgrind come handy.

I suggested starting with the apfelcomb issue because we know how to reproduce it reliably.

scarrazza commented 6 years ago

I think I have identified the bug you are getting at the end of the fit. It is related to the usual string global-overflow-leak and a possible solution is in https://github.com/scarrazza/apfel/pull/10.

scarrazza commented 6 years ago

@wilsonmr @tgiani could you please install this APFEL branch https://github.com/scarrazza/apfel/pull/9, and test again nnfit (master)? If you prefer you don't need to run the fit in your cluster, just take a runcard reduce the ngen to something like 10-100 and start nnfit for replica 1, this should take ~1h to finish.

wilsonmr commented 6 years ago

Set fit running, will get back to you within an hour or so with results!

wilsonmr commented 6 years ago

Actually I got some core dumps in my home directory, and despite removing them, quota isn't updating and so I'm being told I've exceeded my disk space and so I can't do anything in my home directory, it might take a little while longer to get this running unless @tgiani has more luck

tgiani commented 6 years ago

yep I m going to have a try later today

scarrazza commented 6 years ago

Thanks, let me know.

tgiani commented 6 years ago

I have some problems as well, sorry..working on that

tgiani commented 6 years ago

Actually I don 't know if I m doing it right..I 'm doing the conda development installation 1) conda create -n test gxx_linux-64 2) source activate test 3) conda install validphys nnpdf 4) conda remove validphys --force 5) conda remove libnnpdf --force 6) conda remove nnpdf --force 7) conda install pkg-config swig=3.0.10 cmake (other dependencies which do not $ 8) cd nnpdf 9) mkdir conda-bld, cd conda-bld 10) cmake .. -DCMAKE_INSTALL_PREFIX=path/to/anaconda/envs/vp2dev/

with the only difference that after point 6) I m also doing

conda remove apfel --force

and then I m installing apfel from source,using the branch fixstringleak. Is this correct? Trying to compile the nnpdf code I m getting

(test) [s1792848@login04(eddie) conda-bld]$ make
[ 43%] Built target nnpdf
[ 47%] Built target FKconvolute
[ 52%] Built target FKmerge2
[ 54%] Built target gen_nnpdf_nnpdfPYTHON_wrap
[ 58%] Built target _nnpdf
[ 69%] Built target common
[ 76%] Built target filter
[ 78%] Linking CXX executable ../../binaries/nnfit
/exports/csce/eddie/ph/groups/rbm_ml/tommaso/myconda/envs/apfelbug/lib/libAPFEL.so: undefined reference to `memcpy@GLIBC_2.14'
collect2: error: ld returned 1 exit status
make[2]: *** [binaries/nnfit] Error 1
make[1]: *** [nnpdfcpp/src/CMakeFiles/nnfit.dir/all] Error 2
make: *** [all] Error 2
wilsonmr commented 6 years ago

if you're compiling apfel with a conda development environment you also need to get the package gfortran_linux-64

otherwise it will use the default one installed with linux which I think is why it can't find the library

tgiani commented 6 years ago

great thanks, I ll try again with that

tgiani commented 6 years ago

I m getting the same problems Micheal described above with quota, already when I m trying to install the code..so basically I cannot do anything in my home directory on the cluster, and also the conda installation looks broken..I m trying to solve the problem but it could require more time. If usefull I can first test the code locally and not on the cluster, but the problems with apfel displayed only on the cluster..

wilsonmr commented 6 years ago

Hi Stefano, I have ran a fit with the new branch. I appear to be getting one fo the errors I was getting before. Is this string related? Looks like an ODE/integration routine that isn't finishing

In odeintns.f:
 too many steps!
scarrazza commented 6 years ago

Ok, so this is not related to the leak. Could you please send me by mail your runcard?

wilsonmr commented 6 years ago

Yeah sure, actually it's just the fit I asked you to ran at CERN however with the ngen turned down. I'll send it over

scarrazza commented 6 years ago

Just to cross check again, are you using the master of nnpdf?

wilsonmr commented 6 years ago

yes this was using the fixstringleak branch of apfel and master branch of nnpdf

scarrazza commented 6 years ago

Could you please run the Tabulation example inside apfel/examples and check if you get the same too many steps message?

scarrazza commented 6 years ago

I have tested the runcard and I am not able to reproduce this problem.

This sounds like some numerical issue (due to compiler or architecture or something else) in your cluster. so a possible way to sort this out is to open APFEL, print the stored variables produced in your clusters and compare to the output in another machine where this does not happen.

When you run this code in the cluster, is it crashing on the cluster slaves or this happens in the master node too (where you install/setup the code)?

As the point above can be tricky/painful to perform, would be nice if I could access your cluster, do you think this is possible?

wilsonmr commented 6 years ago

I'll try the Tabulation example now. It's crashing in the cluster slaves.. it won't let me run it on the master node, saying it can't allocate the memory.

wilsonmr commented 6 years ago

Is running Tabulation example as simple as this?

cd apfel/examples
./Tabulation

in which case I get the output

INFO: activate-gcc_linux-64.sh made the following environmental changes:
+CC=/exports/csce/eddie/ph/groups/rbm_ml/michael/myconda/envs/apfeltest/bin/x86_64-conda_cos6-linux-gnu-cc
+CFLAGS=-march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -pipe
+_CONDA_PYTHON_SYSCONFIGDATA_NAME=_sysconfigdata_x86_64_conda_cos6_linux_gnu
+CPP=/exports/csce/eddie/ph/groups/rbm_ml/michael/myconda/envs/apfeltest/bin/x86_64-conda_cos6-linux-gnu-cpp
+CPPFLAGS=-DNDEBUG -D_FORTIFY_SOURCE=2 -O2
+DEBUG_CFLAGS=-march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-all -fno-plt -Og -g -Wall -Wextra -fvar-tracking-assignments -pipe
+DEBUG_CPPFLAGS=-D_DEBUG -D_FORTIFY_SOURCE=2 -Og
+GCC_AR=/exports/csce/eddie/ph/groups/rbm_ml/michael/myconda/envs/apfeltest/bin/x86_64-conda_cos6-linux-gnu-gcc-ar
+GCC=/exports/csce/eddie/ph/groups/rbm_ml/michael/myconda/envs/apfeltest/bin/x86_64-conda_cos6-linux-gnu-gcc
+GCC_NM=/exports/csce/eddie/ph/groups/rbm_ml/michael/myconda/envs/apfeltest/bin/x86_64-conda_cos6-linux-gnu-gcc-nm
+GCC_RANLIB=/exports/csce/eddie/ph/groups/rbm_ml/michael/myconda/envs/apfeltest/bin/x86_64-conda_cos6-linux-gnu-gcc-ranlib
+HOST=x86_64-conda_cos6-linux-gnu
+LDFLAGS=-Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now
INFO: activate-binutils_linux-64.sh made the following environmental changes:
+ADDR2LINE=/exports/csce/eddie/ph/groups/rbm_ml/michael/myconda/envs/apfeltest/bin/x86_64-conda_cos6-linux-gnu-addr2line
+AR=/exports/csce/eddie/ph/groups/rbm_ml/michael/myconda/envs/apfeltest/bin/x86_64-conda_cos6-linux-gnu-ar
+AS=/exports/csce/eddie/ph/groups/rbm_ml/michael/myconda/envs/apfeltest/bin/x86_64-conda_cos6-linux-gnu-as
+CXXFILT=/exports/csce/eddie/ph/groups/rbm_ml/michael/myconda/envs/apfeltest/bin/x86_64-conda_cos6-linux-gnu-c++filt
+ELFEDIT=/exports/csce/eddie/ph/groups/rbm_ml/michael/myconda/envs/apfeltest/bin/x86_64-conda_cos6-linux-gnu-elfedit
+GPROF=/exports/csce/eddie/ph/groups/rbm_ml/michael/myconda/envs/apfeltest/bin/x86_64-conda_cos6-linux-gnu-gprof
+LD=/exports/csce/eddie/ph/groups/rbm_ml/michael/myconda/envs/apfeltest/bin/x86_64-conda_cos6-linux-gnu-ld
+LD_GOLD=/exports/csce/eddie/ph/groups/rbm_ml/michael/myconda/envs/apfeltest/bin/x86_64-conda_cos6-linux-gnu-ld.gold
+NM=/exports/csce/eddie/ph/groups/rbm_ml/michael/myconda/envs/apfeltest/bin/x86_64-conda_cos6-linux-gnu-nm
+OBJCOPY=/exports/csce/eddie/ph/groups/rbm_ml/michael/myconda/envs/apfeltest/bin/x86_64-conda_cos6-linux-gnu-objcopy
+OBJDUMP=/exports/csce/eddie/ph/groups/rbm_ml/michael/myconda/envs/apfeltest/bin/x86_64-conda_cos6-linux-gnu-objdump
+RANLIB=/exports/csce/eddie/ph/groups/rbm_ml/michael/myconda/envs/apfeltest/bin/x86_64-conda_cos6-linux-gnu-ranlib
+READELF=/exports/csce/eddie/ph/groups/rbm_ml/michael/myconda/envs/apfeltest/bin/x86_64-conda_cos6-linux-gnu-readelf
+SIZE=/exports/csce/eddie/ph/groups/rbm_ml/michael/myconda/envs/apfeltest/bin/x86_64-conda_cos6-linux-gnu-size
+STRINGS=/exports/csce/eddie/ph/groups/rbm_ml/michael/myconda/envs/apfeltest/bin/x86_64-conda_cos6-linux-gnu-strings
+STRIP=/exports/csce/eddie/ph/groups/rbm_ml/michael/myconda/envs/apfeltest/bin/x86_64-conda_cos6-linux-gnu-strip
INFO: activate-gxx_linux-64.sh made the following environmental changes:
+CXX=/exports/csce/eddie/ph/groups/rbm_ml/michael/myconda/envs/apfeltest/bin/x86_64-conda_cos6-linux-gnu-c++
+CXXFLAGS=-fvisibility-inlines-hidden -std=c++17 -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -pipe
+DEBUG_CXXFLAGS=-fvisibility-inlines-hidden -std=c++17 -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-all -fno-plt -Og -g -Wall -Wextra -fvar-tracking-assignments -pipe
+GXX=/exports/csce/eddie/ph/groups/rbm_ml/michael/myconda/envs/apfeltest/bin/x86_64-conda_cos6-linux-gnu-g++
INFO: activate-gfortran_linux-64.sh made the following environmental changes:
+DEBUG_FFLAGS=-fopenmp -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -pipe -fopenmp -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-all -fno-plt -Og -g -Wall -Wextra -fcheck=all -fbacktrace -fimplicit-none -fvar-tracking-assignments -pipe
+DEBUG_FORTRANFLAGS=-fopenmp -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -pipe -fopenmp -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-all -fno-plt -Og -g -Wall -Wextra -fcheck=all -fbacktrace -fimplicit-none -fvar-tracking-assignments -pipe
+F77=/exports/csce/eddie/ph/groups/rbm_ml/michael/myconda/envs/apfeltest/bin/x86_64-conda_cos6-linux-gnu-gfortran
+F95=/exports/csce/eddie/ph/groups/rbm_ml/michael/myconda/envs/apfeltest/bin/x86_64-conda_cos6-linux-gnu-f95
+FC=/exports/csce/eddie/ph/groups/rbm_ml/michael/myconda/envs/apfeltest/bin/x86_64-conda_cos6-linux-gnu-gfortran
+FFLAGS=-fopenmp -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -pipe
+FORTRANFLAGS=-fopenmp -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -pipe
+GFORTRAN=/exports/csce/eddie/ph/groups/rbm_ml/michael/myconda/envs/apfeltest/bin/x86_64-conda_cos6-linux-gnu-gfortran
At line 60 of file Tabulation.f (unit = 5, file = 'stdin')
Fortran runtime error: End of file

Error termination. Backtrace:
#0  0x2b5c4ad45649 in list_formatted_read_scalar
    at /opt/conda/conda-bld/compilers_linux-64_1520532893746/work/.build/src/gcc-7.2.0/libgfortran/io/list_read.c:2306
#1  0x2b5b9121cfc9 in ???
#2  0x2b5b9121cdc0 in ???
#3  0x2b5c4b48dc04 in ???
#4  0x2b5b9121cdf0 in ???

 Welcome to 
      _/_/_/    _/_/_/_/   _/_/_/_/   _/_/_/_/   _/
    _/    _/   _/    _/   _/         _/         _/
   _/_/_/_/   _/_/_/_/   _/_/_/     _/_/_/     _/
  _/    _/   _/         _/         _/         _/
 _/    _/   _/         _/         _/_/_/_/   _/_/_/_/
 _____v3.0.2 A PDF Evolution Library, arXiv:1310.1394
      Authors: V. Bertone, S. Carrazza, J. Rojo

 Report of the evolution parameters:

 QCD evolution
 Space-like evolution (PDFs)
 Unpolarized evolution
 Evolution scheme: VFNS at N2LO
 Solution of the DGLAP equation: 'exactalpha' with maximum 6 active flavours
 Solution of the coupling equations: 'exact' with maximum 6 active flavours
 Coupling reference value:
 - AlphaQCD(  1.4142 GeV) =  0.350000
 Pole heavy quark masses:
 - Mc =   1.4142 GeV
 - Mb =   4.5000 GeV
 - Mt = 175.0000 GeV
 The matching thresholds coincide with the physical masses
 muR / muF =  1.0000

 Allowed evolution range [   1.0000 :  10000.0000 ] GeV
 The internal subgrids will be locked
 Fast evolution enabled

 Initialization of the evolution completed in   4.298 s

 Enter initial and final scale in GeV^2
scarrazza commented 6 years ago

Yes, thanks. What is the available memory per slave? Can you set to 4Gb?

wilsonmr commented 6 years ago

I actually requested 8Gb for that particular job

Perhaps I'm wrong, because the language they use is slightly different, but I don't think there is a flat answer to that, can you see this link? https://www.wiki.ed.ac.uk/display/ResearchServices/Memory+Specification If not it says that the cluster is comprised of a variety of nodes each with different numbers of cores/memory available, just from scanning the list the two most common setups are 16 cores 64Gb RAM and 16 cores 128Gb RAM... Does that help at all?

Zaharid commented 6 years ago

This looks like a genuine error. Could you try compiling apfel with export FFLAGS=DEBUG_FFLAGS so we can see the apfel part of the backtrace?

Zaharid commented 6 years ago

I think the error with the integration earlier may be because you have to too few iterations, leading to non smmoth nucleons. But there seems to be enough issues that this needs yet some more debugging.

wilsonmr commented 6 years ago

urm ok I think I did what you said? :

(apfeltest) [s1758208@login04(eddie) apfel]$ make
Making all in include
make[1]: Entering directory `/exports/eddie/scratch/s1758208/apfel/include'
Making all in APFEL
make[2]: Entering directory `/exports/eddie/scratch/s1758208/apfel/include/APFEL'
make  all-am
make[3]: Entering directory `/exports/eddie/scratch/s1758208/apfel/include/APFEL'
make[3]: Leaving directory `/exports/eddie/scratch/s1758208/apfel/include/APFEL'
make[2]: Leaving directory `/exports/eddie/scratch/s1758208/apfel/include/APFEL'
make[2]: Entering directory `/exports/eddie/scratch/s1758208/apfel/include'
make[2]: Nothing to be done for `all-am'.
make[2]: Leaving directory `/exports/eddie/scratch/s1758208/apfel/include'
make[1]: Leaving directory `/exports/eddie/scratch/s1758208/apfel/include'
Making all in ccwrap
make[1]: Entering directory `/exports/eddie/scratch/s1758208/apfel/ccwrap'
  F77      APFELfwevol.lo
APFELfwevol.f:30:14:

       fxpdf = xPDF(i,x)
              1
Error: Function 'xpdf' at (1) has no IMPLICIT type
APFELfwevol.f:38:16:

       fxpdfxq = xPDFxQ(i,x,Q)
                1
Error: Function 'xpdfxq' at (1) has no IMPLICIT type
APFELfwevol.f:46:15:

       fxpdfj = xPDFj(i,x)
               1
Error: Function 'xpdfj' at (1) has no IMPLICIT type
APFELfwevol.f:54:15:

       fdxpdf = dxPDF(i,x)
               1
Error: Function 'dxpdf' at (1) has no IMPLICIT type
APFELfwevol.f:61:16:

       fxgamma = xgamma(x)
                1
Error: Function 'xgamma' at (1) has no IMPLICIT type
APFELfwevol.f:68:17:

       fxgammaj = xgammaj(x)
                 1
Error: Function 'xgammaj' at (1) has no IMPLICIT type
APFELfwevol.f:75:17:

       fdxgamma = dxgamma(x)
                 1
Error: Function 'dxgamma' at (1) has no IMPLICIT type
APFELfwevol.f:95:17:

       fxlepton = xLepton(i,x)
                 1
Error: Function 'xlepton' at (1) has no IMPLICIT type
APFELfwevol.f:103:18:

       fxleptonj = xLeptonj(i,x)
                  1
Error: Function 'xleptonj' at (1) has no IMPLICIT type
APFELfwevol.f:154:18:

       falphaqcd = AlphaQCD(Q)
                  1
Error: Function 'alphaqcd' at (1) has no IMPLICIT type
APFELfwevol.f:161:18:

       falphaqed = AlphaQED(Q)
                  1
Error: Function 'alphaqed' at (1) has no IMPLICIT type
APFELfwevol.f:169:14:

       fnpdf = NPDF(i,N)
              1
Error: Function 'npdf' at (1) has no IMPLICIT type
APFELfwevol.f:177:16:

       fngamma = Ngamma(N)
                1
Error: Function 'ngamma' at (1) has no IMPLICIT type
APFELfwevol.f:185:14:

       flumi = LUMI(i,j,S)
              1
Error: Function 'lumi' at (1) has no IMPLICIT type
APFELfwevol.f:193:15:

       fxgrid = xGrid(alpha)
               1
Error: Function 'xgrid' at (1) has no IMPLICIT type
APFELfwevol.f:198:6:

       function fnintervals()
      1
Error: Function 'fnintervals' at (1) has no IMPLICIT type
APFELfwevol.f:268:24:

       fheavyquarkmass = HeavyQuarkMass(i,Q)
                        1
Error: Function 'heavyquarkmass' at (1) has no IMPLICIT type
APFELfwevol.f:276:22:

       fgetthreshold = GetThreshold(i)
                      1
Error: Function 'getthreshold' at (1) has no IMPLICIT type
APFELfwevol.f:284:29:

       fheavyquarkthreshold = HeavyQuarkThreshold(i)
                             1
Error: Function 'heavyquarkthreshold' at (1) has no IMPLICIT type
make[1]: *** [APFELfwevol.lo] Error 1
make[1]: Leaving directory `/exports/eddie/scratch/s1758208/apfel/ccwrap'
make: *** [all-recursive] Error 1
scarrazza commented 6 years ago

If tabulation doesn't work then this has nothing to do about smooth initial pdfs.

I still think there is something funny with the compiler/memory of these machines. I will prepare a custom version of tabulation where we print the most relevant variables, and then we compare the output.

Zaharid commented 6 years ago

@wilsonmr Maybe remove -fno-implicit.

wilsonmr commented 6 years ago

removed the closest thing to that -fimplicit-none do you want the full output of compiling? It's very long..

Zaharid commented 6 years ago

Ideally, we want the ourput of running.

scarrazza commented 6 years ago

Could you please modify Tabulation.f with the diff above, recompile and run:

diff --git a/examples/Tabulation.f b/examples/Tabulation.f
index 4d3416b..6af7006 100644
--- a/examples/Tabulation.f
+++ b/examples/Tabulation.f
@@ -57,7 +57,9 @@ c      call SetMaxFlavourAlpha(5)
 *     Evolve PDFs on the grids
 *
       write(6,*) "Enter initial and final scale in GeV^2"
-      read(5,*) Q02,Q2
+*     read(5,*) Q02,Q2
+      Q02 = 1d0
+      Q2 = 10d0
 *
       Q0 = dsqrt(Q02) - eps
       Q  = dsqrt(Q2)
wilsonmr commented 6 years ago

hi, I did what you asked, I forgot to compile with debug flags however I'm guessing there were no errors

Welcome to 
      _/_/_/    _/_/_/_/   _/_/_/_/   _/_/_/_/   _/
    _/    _/   _/    _/   _/         _/         _/
   _/_/_/_/   _/_/_/_/   _/_/_/     _/_/_/     _/
  _/    _/   _/         _/         _/         _/
 _/    _/   _/         _/         _/_/_/_/   _/_/_/_/
 _____v3.0.2 A PDF Evolution Library, arXiv:1310.1394
      Authors: V. Bertone, S. Carrazza, J. Rojo

 Report of the evolution parameters:

 QCD evolution
 Space-like evolution (PDFs)
 Unpolarized evolution
 Evolution scheme: VFNS at N2LO
 Solution of the DGLAP equation: 'exactalpha' with maximum 6 active flavours
 Solution of the coupling equations: 'exact' with maximum 6 active flavours
 Coupling reference value:
 - AlphaQCD(  1.4142 GeV) =  0.350000
 Pole heavy quark masses:
 - Mc =   1.4142 GeV
 - Mb =   4.5000 GeV
 - Mt = 175.0000 GeV
 The matching thresholds coincide with the physical masses
 muR / muF =  1.0000

 Allowed evolution range [   1.0000 :  10000.0000 ] GeV
 The internal subgrids will be locked
 Fast evolution enabled

 Initialization of the evolution completed in   4.348 s

 Enter initial and final scale in GeV^2
 alpha_QCD(mu2F) =  0.24423490003380316     
 alpha_QED(mu2F) =   7.5400351612049015E-003

 Standard evolution:
    x      u-ubar      d-dbar    2(ubr+dbr)    c+cbar       gluon      photon     e^-+e^+    mu^-+mu^+   tau^-+tau^+
1.0E-05  1.8921E-03  1.2054E-03  1.2870E+01  3.1967E+00  5.2513E+01  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00
1.0E-04  8.6754E-03  5.2500E-03  7.1656E+00  1.6049E+00  3.0403E+01  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00
1.0E-03  4.0463E-02  2.3677E-02  3.7563E+00  7.1481E-01  1.5276E+01  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00
1.0E-02  1.8350E-01  1.0492E-01  1.7845E+00  2.5205E-01  6.0821E+00  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00
1.0E-01  5.8322E-01  2.9874E-01  4.4635E-01  3.5459E-02  1.2055E+00  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00
3.0E-01  4.8226E-01  1.8924E-01  5.4302E-02  3.0448E-03  1.6810E-01  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00
5.0E-01  2.0668E-01  5.7324E-02  4.5800E-03  2.3472E-04  2.0605E-02  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00
7.0E-01  4.3973E-02  7.2552E-03  1.3288E-04  7.7072E-06  1.2059E-03  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00
9.0E-01  1.1975E-03  6.5373E-05  1.0148E-07  1.3555E-08  5.4731E-06  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00

 Standard evolution using the xPDFall function:
    x      u-ubar      d-dbar    2(ubr+dbr)    c+cbar       gluon
1.0E-05  1.8921E-03  1.2054E-03  1.2870E+01  3.1967E+00  5.2513E+01
1.0E-04  8.6754E-03  5.2500E-03  7.1656E+00  1.6049E+00  3.0403E+01
1.0E-03  4.0463E-02  2.3677E-02  3.7563E+00  7.1481E-01  1.5276E+01
1.0E-02  1.8350E-01  1.0492E-01  1.7845E+00  2.5205E-01  6.0821E+00
1.0E-01  5.8322E-01  2.9874E-01  4.4635E-01  3.5459E-02  1.2055E+00
3.0E-01  4.8226E-01  1.8924E-01  5.4302E-02  3.0448E-03  1.6810E-01
5.0E-01  2.0668E-01  5.7324E-02  4.5800E-03  2.3472E-04  2.0605E-02
7.0E-01  4.3973E-02  7.2552E-03  1.3288E-04  7.7072E-06  1.2059E-03
9.0E-01  1.1975E-03  6.5373E-05  1.0148E-07  1.3555E-08  5.4731E-06

 PDFs have been cached

 Caching completed in  0.104 s

 Cached evolution:
    x      u-ubar      d-dbar    2(ubr+dbr)    c+cbar       gluon      photon     e^-+e^+    mu^-+mu^+   tau^-+tau^+
1.0E-05  1.8921E-03  1.2054E-03  1.2870E+01  3.1967E+00  5.2513E+01  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00
1.0E-04  8.6754E-03  5.2500E-03  7.1656E+00  1.6049E+00  3.0403E+01  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00
1.0E-03  4.0463E-02  2.3677E-02  3.7563E+00  7.1481E-01  1.5276E+01  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00
1.0E-02  1.8350E-01  1.0492E-01  1.7845E+00  2.5205E-01  6.0821E+00  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00
1.0E-01  5.8322E-01  2.9874E-01  4.4635E-01  3.5459E-02  1.2055E+00  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00
3.0E-01  4.8226E-01  1.8924E-01  5.4302E-02  3.0448E-03  1.6810E-01  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00
5.0E-01  2.0668E-01  5.7324E-02  4.5800E-03  2.3472E-04  2.0605E-02  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00
7.0E-01  4.3973E-02  7.2552E-03  1.3289E-04  7.7072E-06  1.2059E-03  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00
9.0E-01  1.1975E-03  6.5373E-05  1.0137E-07  1.3522E-08  5.4731E-06  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00

 Cached evolution using the xPDFxQall function:
    x      u-ubar      d-dbar    2(ubr+dbr)    c+cbar       gluon
1.0E-05  1.8921E-03  1.2054E-03  1.2870E+01  3.1967E+00  5.2513E+01
1.0E-04  8.6754E-03  5.2500E-03  7.1656E+00  1.6049E+00  3.0403E+01
1.0E-03  4.0463E-02  2.3677E-02  3.7563E+00  7.1481E-01  1.5276E+01
1.0E-02  1.8350E-01  1.0492E-01  1.7845E+00  2.5205E-01  6.0821E+00
1.0E-01  5.8322E-01  2.9874E-01  4.4635E-01  3.5459E-02  1.2055E+00
3.0E-01  4.8226E-01  1.8924E-01  5.4302E-02  3.0448E-03  1.6810E-01
5.0E-01  2.0668E-01  5.7324E-02  4.5800E-03  2.3472E-04  2.0605E-02
7.0E-01  4.3973E-02  7.2552E-03  1.3289E-04  7.7072E-06  1.2059E-03
9.0E-01  1.1975E-03  6.5373E-05  1.0137E-07  1.3522E-08  5.4731E-06
scarrazza commented 6 years ago

Ok, let's do more tests, could you please modify, recompile and rerun nnfit (with my previous card) after applying the changes below? (in principle this should work because almost identical to tabulation):

diff --git a/nnpdfcpp/src/nnfit/src/apfelevol.cc b/nnpdfcpp/src/nnfit/src/apfelevol.cc
index e1a2cf5..bd617ee 100644
--- a/nnpdfcpp/src/nnfit/src/apfelevol.cc
+++ b/nnpdfcpp/src/nnfit/src/apfelevol.cc
@@ -38,6 +38,7 @@ APFELSingleton::APFELSingleton():

 void APFELSingleton::Initialize(NNPDFSettings const& set, PDFSet *const& pdf)
 {
+  /*
   // Check APFEL  
   bool check = APFEL::CheckAPFEL();
   if (check == false)
@@ -45,6 +46,7 @@ void APFELSingleton::Initialize(NNPDFSettings const& set, PDFSet *const& pdf)
       std::cout << Colour::FG_RED << "[CheckAPFEL] ERROR, test not succeeded!" << std::endl;
       std::exit(-1);
     }    
+  */

   // initialize attributes
   getInstance()->fPDF = pdf;
@@ -273,10 +275,11 @@ void APFELSingleton::Initialize(NNPDFSettings const& set, PDFSet *const& pdf)
   APFEL::SetQLimits(getInstance()->fQ0, getInstance()->fQmax + 1E-5); // Epsilon for limits

   APFEL::SetNumberOfGrids(1);
-  APFEL::SetExternalGrid(1, 195, 5, X1);
+  //APFEL::SetExternalGrid(1, 195, 5, X1);
+  APFEL::SetGridParameters(1, 50, 5, 1e-10);
   APFEL::LockGrids(true);
   APFEL::SetPDFSet("external");
-  APFEL::SetFastEvolution(false);
+  //APFEL::SetFastEvolution(false);
   APFEL::InitializeAPFEL();
wilsonmr commented 6 years ago

That seems to have outputted all of the expected files, would you like me to send you anything?

scarrazza commented 6 years ago

Good, please send by mail the folder. Could you please rerun with APFEL::SetFastEvolution(false); uncommented, if it works uncomment the checkapfel too? I would like to isolate the problem, I am pretty sure it is correlated to the external grid.

wilsonmr commented 6 years ago

ok sure

wilsonmr commented 6 years ago

Uncommenting APFEL::SetFastEvolution(false); appears to have introduced the error

scarrazza commented 6 years ago

Good, could you please revert to the original version and just comment the fast evolution?

wilsonmr commented 6 years ago

This is really weird I'm getting sporadic seg faults

wilsonmr commented 6 years ago

ok I think it was just the cluster not behaving correctly/me requesting too much memory per core. They appear to be running now

scarrazza commented 6 years ago

Ok, when they are done please let me know if the fast evolution is the origin of the problem.