HEPData / hepdata_lib

Library for getting your data into HEPData
https://hepdata-lib.readthedocs.io
MIT License
16 stars 39 forks source link

Integration with pyhf JSON workspaces #98

Open kratsg opened 5 years ago

kratsg commented 5 years ago

/cc @matthewfeickert @lukasheinrich -- we should probably file an issue here to investigate the possibility of getting the HEPData to handle pyhf JSON specifications (additionally teaching it to export the given specification to root+xml as well if needed).

I'm hoping to use this issue as a place to hold discussion on this. For reference, we do have a JSON schema that fully specifies the workspace and will be releasing a pyhf version on pypi shortly that contains the v1.0.0 of this schema.

lukasheinrich commented 5 years ago

yes this has been a long-term (read: years) project and I initially came up with some code that reads in ROOT workspaces and spits out HepData

https://github.com/lukasheinrich/hf2hd-demo

but we should absolutely revisit this. (though arguably, just uploading the likelihood is sufficient if all the hepdata records can be fully generated from them)

kratsg commented 5 years ago

Just a quick note that we do have a very nice feature of pyhf that allows you to produce summaries of the JSON schemas. See diana-hep/pyhf#443 for details. We currently provide (beta) a pyhf inspect command line tool that pretty-prints a summary of the JSON specification in a human-readable format. This can (and probably should) spit out a JSON of the summary as well to be consumed in an automated fashion. Is this something of interest for HEPData to use?

clelange commented 5 years ago

Hi @kratsg - mind that this tool is mainly meant for converting input to a format that can be ingested by HEPData. Once this is the case for pyhf workspaces (my understanding that this is currently not so), it'd be great if you added this to hepdata_lib. For discussion on what can be added to HEPData and how, you probably have to communicate with the HEPData developers/maintainers directly (I guess preferably by email).

kratsg commented 5 years ago

hi @clelange , I did not realize the two were somewhat separated. Should hepdata_lib effectively support something like yaml.dump(json.load(open('workspace.json')))? Really, that's most of the work as the entire specification is in a single JSON document.

So HEPData needs to support this first, before hepdata_lib can write a converter for it?

clelange commented 5 years ago

I think I wasn't reading carefully, sorry. If you contribute code similar to https://github.com/lukasheinrich/hf2hd-demo that converts the workspace.json into the YAML format that is understood by HEPData as part of the submission.tar.gz (which is effectively what hepdata_lib does for other formats such as ROOT histograms already), this is perfectly fine. Do I understand correctly that this is your plan? I'm not sure I understand why additional exports to root+xml are needed.

kratsg commented 5 years ago

I'm not sure I understand why additional exports to root+xml are needed.

This is usually because "ROOT+XML" is what people already use (the HistFactory workspace) and I think, in some cases already, these have been uploaded for an analysis or two in the past (but I'm reaaaaaally not sure here). The fact that this functionality is possible means it could be useful to have the likelihood exported into different formats depending on what you want.. but I don't know if this is something HEPData wants to do or not.

I think I wasn't reading carefully, sorry. If you contribute code similar to https://github.com/lukasheinrich/hf2hd-demo that converts the workspace.json into the YAML format that is understood by HEPData as part of the submission.tar.gz (which is effectively what hepdata_lib does for other formats such as ROOT histograms already), this is perfectly fine. Do I understand correctly that this is your plan?

yeah, that should be ~what we want :)

lukasheinrich commented 5 years ago

just note that a conversion into hepdata yaml will always be lossy.. the full likelihood probably will require uploading the full spec to hepdata (either as aux material or as a native integration as @GraemeWatt suggested). Bit a lossy projection an still be useful: The generated hepdata tables can be e.g. the equivalent of pre/post fit plots we usually produce)