N3PDF / pycompressor

Compression code for PDF replicas.
https://n3pdf.github.io/pycompressor/
GNU General Public License v3.0
1 stars 0 forks source link

To DO #2

Closed Radonirinaunimi closed 4 years ago

Radonirinaunimi commented 4 years ago

The following would be more or less the steps (in increasing order) that I will be following in the python implementation of the compressor.

Radonirinaunimi commented 4 years ago

@scarrazza @scarlehoff, Apologies for not providing updates for the past days, I had to entirely finish the correction of the combined resummation paper.

At this point, the pycompressor code is fully operational (as introduced briefly during the last meeting). This means that one can now choose between GA and CMA as a minimization algorithm. The issue of duplicates has now been somehow solved. However, the CMA is not optimal anymore when one wants get more than 25% the size of the prior replicas. For instance, extracting 200 replicas from 1000 prior takes way too much time. (At this point, I'm clueless on how to overcome this)

Now, I have questions concerning the next steps:

  1. At which point should the Validphys Action be implemented? Before or after incorporating the GANs within the compressor?
  2. What would be the optimal way to include the GANs? Does it have to be implemented within the pycompressor itself? At some point, the idea was to include an --enhance flag in the pycompressor which will turn on the GANs.

Anyway, at the moment, I'm doing a deep cleaning of the GANs code and extending it to multiple flavours.

scarlehoff commented 4 years ago

At which point should the Validphys Action be implemented? Before or after incorporating the GANs within the compressor?

It is up to you, in the sense that this is a implementation detail but should not change the logic. I would do it before the GANs because then you can use validphys to get replicas from the NNPDF fits and apply the GANs to them (instead of applying them to LHAPDF sets).

What would be the optimal way to include the GANs? Does it have to be implemented within the pycompressor itself? At some point, the idea was to include an --enhance flag in the pycompressor which will turn on the GANs.

As a first approximation (don't know if @scarrazza has any idea in mind) I'd say this is a good idea: An --enhance flag that turns on the GANs and create extra replicas and then produces the compressed set. After that's working we can decide that the --enhance flag should instead create replicas dynamically depending on the goodness of the fit or whatever. But to test that it is working, having a --enhance -> use GAN -> compress with these extra replicas workflow is ok.

btw, let me know if you want any help with the deployment of the docs, sometimes it is a bit tricky to get them to look fine. (also, take into account that there are no private docs in github, so if you deploy to github pages they will be public, not that it is a problem but just so you know!)

Radonirinaunimi commented 4 years ago

It is up to you, in the sense that this is a implementation detail but should not change the logic. I would do it before the GANs because then you can use validphys to get replicas from the NNPDF fits and apply the GANs to them (instead of applying them to LHAPDF sets).

Could you point me out to materials that explains how a ValidPhys action should be implemented? I have checked the documentation on the wiki and it only provides description on how to use validphys.

btw, let me know if you want any help with the deployment of the docs, sometimes it is a bit tricky to get them to look fine. (also, take into account that there are no private docs in github, so if you deploy to github pages they will be public, not that it is a problem but just so you know!)

I wasn't really concerned about this but I'm glad you asked :sweat_smile:. You might indeed easily identify the flaw in the deployment I did here deploy_docs. The documentation webpage https://n3pdf.github.io/pycompressor/ is not properly rendered although it does render fine in my machine. I do not know if this is just because the repo is private.

scarlehoff commented 4 years ago

Could you point me out to materials that explains how a ValidPhys action should be implemented?

I don't think it exists

Rather than an action what you would be creating is an App. I recommend you to look at the one @scarrazza did for n3fit long time ago, it is relatively clean: https://github.com/N3PDF/n3fit-experimentation/blob/app/pyfit/n3fit.py

Basically the idea is to create a pycompressor_app.py (or whatever) with a provider (N3FIT_PROVIDERS = ['fit']) referring to the compressor.

The in the file where the compressor provider is you can do something akin to what we do in fit: https://github.com/N3PDF/n3fit-experimentation/blob/81b833ab0c531573ea454482ec13816d5a9c1082/pyfit/fit.py#L43 where the optimizer, experiments and t0set sets are coming from the runcard.

What you would need to do is to have a compressor provider which for instance just says:

def compressor_provider(fit):

Then you runcard can simply be

fit: NNPDF32_jcm_020428

So that when you run your app with the given runcard (pycompressor_app.py given_runcard.yml) validphys will automagically populate your fit argument with the fit object from validphys (where you will have all the replicas, experiments it was fitted with and so on).

This is the idea. Only god knows what you will find when you try to do it (the dependence graph of validphys is pretty much non trivial so you might find out that to define a fit you need to specify in the runcard the temperature of the water in the Mediterranean Sea). Let me know if you find any obstacles.

Radonirinaunimi commented 4 years ago

@scarlehoff Thanks a lot for this information :+1:

Radonirinaunimi commented 4 years ago

I have tried to read through the codes mentioned above but it is hard for me to grasp the general picture...

The in the file where the compressor provider is you can do something akin to what we do in fit: https://github.com/N3PDF/n3fit-experimentation/blob/81b833ab0c531573ea454482ec13816d5a9c1082/pyfit/fit.py#L43 where the optimizer, experiments and t0set sets are coming from the runcard.

This part, I kind of understand. Basically, I will just have to write a python code that imports the modules from pycompressor in which I have a routine compress that takes as arguments (for instance) the replicas name, the size of compressed replicas, and the minimizer--which will be fetched from a runcard.yml.

But this part,

Basically the idea is to create a pycompressor_app.py (or whatever) with a provider (N3FIT_PROVIDERS = ['fit']) referring to the compressor.

nor this sentence,

So that when you run your app with the given runcard (pycompressor_app.py given_runcard.yml) validphys will automagically populate your fit argument with the fit object from validphys (where you will have all the replicas, experiments it was fitted with and so on).

I don't fully understand what it does. What I will have to care at the end of the day is just the replicas from the fit (and not worrying about experiments, etc.), right?

scarlehoff commented 4 years ago

I don't fully understand what it does. What I will have to care at the end of the day is just the replicas from the fit (and not worrying about experiments, etc.), right?

Yes. What I meant is that validphys would populate the fit thing with all the information it has about the fit. You then can select only the pieces you need.

Here you can have a MWE for a validphys app that prints out the path of the fit. You can already use this as a starting point as you can specify any fit and validphys will first check whether it is in your computer, if not it will download it from the vp server and then will tell you were it is (and the replicas will be in the same folder).

I guess there is also a key somewhere to get directly a reference to the replicas but I don't know it.

app.py

import pathlib
import warnings
import argparse

from validphys.app import App
from validphys.config import Environment, Config
from validphys.config import ConfigError
from reportengine.compat import yaml

myapp_FIXED_CONFIG = dict(actions_=["compressor"])
myapp_PROVIDERS = ["compressor"]
RUNCARD_COPY_FILENAME = "inputrc.yml"
INPUT_FOLDER = "input"

class myappEnvironment(Environment):
    """Container for information to be filled at run time"""

    def init_output(self):
        # check file exists, is a file, has extension.
        if not self.config_yml.exists():
            raise Exception("Invalid runcard. File not found.")
        else:
            if not self.config_yml.is_file():
                raise Exception("Invalid runcard. Must be a file.")

        # Create io folder
        self.output_path = pathlib.Path(self.output_path).absolute()
        self.output_path.mkdir(exist_ok=True)

        self.input_folder = self.output_path / INPUT_FOLDER
        self.input_folder.mkdir(exist_ok=True)

class myappConfig(Config):
    """Specialization for yaml parsing"""

    @classmethod
    def from_yaml(cls, o, *args, **kwargs):
        try:
            with warnings.catch_warnings():
                warnings.simplefilter("ignore", yaml.error.MantissaNoDotYAML1_1Warning)
                file_content = yaml.safe_load(o, version="1.1")
        except yaml.error.YAMLError as e:
            raise ConfigError(f"Failed to parse yaml file: {e}")
        if not isinstance(file_content, dict):
            raise ConfigError(f"Expecting input runcard to be a mapping, " f"not '{type(file_content)}'.")
        file_content.update(myapp_FIXED_CONFIG)
        return cls(file_content, *args, **kwargs)

class myappApp(App):
    """The class which parsers and performs the fit"""

    environment_class = myappEnvironment
    config_class = myappConfig

    def __init__(self):
        super(myappApp, self).__init__(name="myapp", providers=myapp_PROVIDERS)

    @property
    def argparser(self):
        parser = super().argparser
        parser.add_argument("-o", "--output", help="Output folder", default=None)
        return parser

    def get_commandline_arguments(self, cmdline=None):
        args = super().get_commandline_arguments(cmdline)
        if args["output"] is None:
            args["output"] = pathlib.Path(args["config_yml"]).stem
        return args

    def run(self):
        self.environment.config_yml = pathlib.Path(self.args["config_yml"]).absolute()
        super().run()

def main():
    a = myappApp()
    a.main()

if __name__ == "__main__":
    main()

compressor.py

def compressor(fit):
     print(fit.path)

runcard.yml

fit: PN3_DIS_130519

Then you can run this with

python app.py runcard.yml