NNPDF / nnpdf

An open-source machine learning framework for global analyses of parton distributions.
https://docs.nnpdf.science/
GNU General Public License v3.0
30 stars 6 forks source link

NLO global fits with scale variations #391

Closed enocera closed 5 years ago

enocera commented 5 years ago

As agreed in Amsterdam, we need to run NLO global fits with scale variations (for a total of 9 fits). These fits are currently listed in the wiki. The complete set of runcards is also attached. NLO_global_scalevar.zip

@lucarottoli Could you please check that the data set and the cuts are those agreed in Amsterdam? @Zaharid @scarrazza Would it be cleaner to (a) wait for #387 and #244 to be merged into master and (b) rewrite the runcard in terms of BIGEXP before running these fits?

lucarottoli commented 5 years ago

@enocera the dataset seems to be fine, the intersection of the cuts also seems to be fine, we can test it when #387 is merged. For easiness when downloading/comparing the theories with vp, perhaps it might be convenient to change the runcard names from

190302-ern-nlo-xF2xxxR2yyy_global.yml

to something like

190302-ern-nlo-xF2xxxR2yyy_-zzz-global.yml

where zzz is the corresponding theory id?

enocera commented 5 years ago

@lucarottoli I've updated the names of the runcards according to your suggestion. As a general remark, these names (as well as the previous ones) are variations of the standard ones yymmdd-xxx-yyy (which SF seems to like). I'm personally in favour of more explicative names. That said, all these fits can be run with the master, now. I'm starting to take three of them.

lucarottoli commented 5 years ago

@enocera I am also in favour of these more esplicative names. Since Stefano Forte is not manipulating directly many fits, he probably does not imagine that it becomes very difficult to handle tens of fits if they are characterised only by a date (plus, several fits can be run by the same person the same day). I would be happy to launch fits on hydra but I still haven’t heard from @scarrazza which should first address #371, otherwise hydra is useless.

Zaharid commented 5 years ago

Also I'd be slightly happier if fit names were valid variable identifiers on common programming languages, as well as valid unquoted filenames and URLs.

On Thu, 7 Mar 2019, 15:47 lucarottoli, notifications@github.com wrote:

@enocera https://github.com/enocera I am also in favour of these more esplicative names. Since Stefano Forte is not manipulating directly many fits, he probably does not imagine that it becomes very difficult to handle tens of fits if they are characterised only by a date (plus, several fits can be run by the same person the same day). I would be happy to launch fits on hydra but I still haven’t heard from @scarrazza https://github.com/scarrazza which should first address #371 https://github.com/NNPDF/nnpdf/issues/371, otherwise hydra is useless.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NNPDF/nnpdf/issues/391#issuecomment-470578419, or mute the thread https://github.com/notifications/unsubscribe-auth/AFabUh2G2cyZZTQA1Nt_-zOC9uXh9IEFks5vUTR8gaJpZM4bbQfy .

enocera commented 5 years ago

@Zaharid Sorry but I don't understand: what is making you unhappy? The length of the runcard name? The hyphen? Or what?

lucarottoli commented 5 years ago

@enocera I think he's happy with your choice of names but just reminding that the choice of special characters must be wise (dots and slashes in the name break things)

Zaharid commented 5 years ago

I guess I am saying that I'd be happier with underscores than I am with dashes.

enocera commented 5 years ago

@Zaharid @lucarottoli Thanks for the clarification. I've replaced dashes with underscores.

lucarottoli commented 5 years ago

@enocera @Zaharid I have tried to use conda on the oxford cluster to see if this solves the problem with #371 so that I can help with running the fits. The installation was smooth but I get errors like these when submitting the jobs (if I run nnfit locally it seems to work fine)

Job output begins
-----------------

/var/spool/slurmd//job1359520/slurm_script: line 14: /usr/local/shared/slurm/bin/srun: Argument list too long
---------------
Job output ends
/var/spool/slurmd//job1359520/slurm_script: line 27: /bin/date: Argument list too long
=========================================================
/var/spool/slurmd//job1359520/slurm_script: line 34: /bin/date: Argument list too long
PBS job: finished date =
Total run time : 0 Hours 0 Minutes 0 Seconds
=========================================================

@Zaharid do you know if this might be a conda related issue? The submission was fine with a non-conda installation, but I also contacted the cluster maintainer to see if something changed and perhaps the fact that I am now using conda is only a red herring.

enocera commented 5 years ago

@lucarottoli I've completed a conda installation on the Oxford cluster. I've also activated a conda environment (where I used the latest version of the master code, my understanding is that you need to do this, if you want to use the intersect cuts thing). I've added a line to my job script where I source the conda environment. Everything flows smoothly, that is replicas run and complete without issues (I've just tried that by launching a trial fit with one small exp). I'm confident that if you follow step-by-step the documentation you'll succeed.

lucarottoli commented 5 years ago

@enocera do you have 5 mins on Skype now? might be easier

enocera commented 5 years ago

@lucarottoli Not now, please. Can we chat later (9pm CET time)?

lucarottoli commented 5 years ago

@enocera glad that conda works fine, I probably just need to adapt my scripts but if you explain me what you did it should be straightforward. Thanks, sure, 9pm CET - 12pm here works for me.

Zaharid commented 5 years ago

@lucarottoli I am afraid that I lack context, particularly as to what is inside the script that is failing. It does seem like a trivial syntax error (like you have to quote something somewhere). In any case this seems rather unrelated (but maybe related to the fact that you need to set some environment variables somewhere).

Indeed @enocera is probably on a better position to help. It would be great if we had some up to date example submission scripts somewhere.

As @enocera I would recommend having a production conda environment that tracks the latest version of the master package (i.e. conda create -n nndeploy nnpdf) and using that for the fits (by activating it in the relevant places, conda activate nndeploy). This will minimize (and hopefully remove) the conflicts with anything else you are doing.

lucarottoli commented 5 years ago

@Zaharid it's probably better if I talk to @enocera as even after adding source-conda I get a segfault after APFEL initialization in nnfit, even if I run locally. Since he is able to run I guess he should be able to tell me what I did wrong.

enocera commented 5 years ago

@lucarottoli I've added in the wiki https://www.wiki.ed.ac.uk/display/nnpdfwiki/NNPDF3.1+fits+with+scale+variations a new table with the runcards for the iteration of the scale variation fits. Could you please run the fits associated to your name? Thanks.

lucarottoli commented 5 years ago

@enocera will do, thanks