Open GraemeWatt opened 1 year ago
Hi Graeme!
I'm just in the process of preparing submissions for the reference data files in Rivet that don't have a HepData entry yet. I'm currently struggling to use the hepdata_lib
for cases with inhomogeneous error breakdowns across bins. For instance, I have a distribution with three bins where the first two bins have two error components 'A' and 'B' (but not 'C') and the third bin has error component 'C' (but not 'A' and 'B').
I know this is supported in principle, e.g. by just omitting the respective components in the dictionary. However, when using the library, it seems hepdata_lib/helpers.py
raises a ValueError
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (3,) + inhomogeneous part.
Is there a trick?
PS - just to be clear: of course I can "make it pass" by just setting the uncertainty to zero, but then all bins will have three uncertainty components, some of them zero, which is not the same as the bin not having the component in its breakdown to begin with. I think the problem is that the check for "non-zero uncertainties" only checks if there's at least one non-zero component and then adds all of them, regardless of their value. Can we make this more flexible?
On a different note: We have few cases where we have a discrete (string) axis where a subset of the edges is technically a floating point range. The library throws an error e.g. like this
error - independent_variable 'value' must not be a string range (use 'low' and 'high' to represent a range): '1.225-1.300' in 'independent_variables[0].values[6].value' (expected: {'type': 'number or string (not a range)'})
Of course I agree that a discrete axis where all bins are of the form float - float
should just be a continuous axis and it's great that the validator enforces this. However, there are also a number of examples on HepData where we have a mix of these kind of bins with genuine discrete bins and we might want to allow this kind of axis in general, no?
One simple example I'm just looking at is one where we have two bins = [ "7 - 8", "13" ]
corresponding to LHC centre-of-mass energies. One could get around the error by splitting this table into separate tables with a continuous [7.0, 8.0]
bin or a discrete [ "13" ]
bin, respectively, but then the two measurement points would not end up in the same plot without additional post-processing, which seems a shame. 🤔
On second thought, I suspect this requirement comes from the cases where we have a differential distribution, which is prepended/appended by a single bin corresponding to the average, which probably shouldn't be allowed. Maybe best to leave the validator as is and I will work around these cases (there's only 5 of them, so should be manageable).
This error comes from the hepdata-validator
package rather than hepdata_lib
. It was a common encoding mistake that uploaders specified a bin as a single value
with the bin limits separated by a hyphen rather than giving separate low
and high
values (HEPData/hepdata-validator#33), so we implemented a check to catch it. I think hepdata_lib
does not support mixed bins such as {low: 7, high: 8}
and value: 13
, although this is allowed in the HEPData YAML format. You could use {low: 13, high: 13}
(unless a zero-width bin causes problems?) or use a separator other than -
for the discrete bin "7 - 8" like "7 to 8" or "7 & 8".
Well, there were only 5 cases where I encountered this issue, so I've just replaced the dash with a "to" or "&" , depending on the context. It's sufficiently rare that this is probably good enough for now.
Good news, though: I've now managed to create submission tarballs that make the validator happy for all of the Rivet reference files that don't have a HepData entry yet. There's a total of 780 tarballs. What's the best way to submit them? I hope I don't have to upload them through the browser one by one? 😉
PS - I have a guest account for the IPPP cluster if it would be helpful for me to upload them there somewhere?
Great work! You should log into hepdata.net and click "Request Coordinator Privileges" on your Dashboard, then enter "Rivet" as the Experiment/Group. You can then click the "Submit" button to initiate a submission with an INSPIRE ID and specify an Uploader and Reviewer (maybe just yourself in both roles, unless you want a check from someone else). This will create an empty record that allows you to upload, then the record can be reviewed (there's a shortcut "Approve All Tables") and finalised from your Dashboard.
In terms of automation, we haven't yet encountered a need for bulk uploads like this, so unfortunately, there's not an easy way to finalise 780 records. The upload stage could be done from the command line (or from Python) using the hepdata-cli
tool (see Example 9), but it requires an invitation cookie specific for each record. The record creation, reviewing and finalisation can only be done from the web interface. It might be possible to (semi-)automate these steps using something like Selenium, but I think that each record should undergo a basic visual check by a human before it is finalised. I suggest that you perform the create/upload/review/finalise workflow manually for a few records until you see what is involved, then you can decide whether it is worthwhile to look into writing scripts to (semi-)automate the procedure.
I've approved your Coordinator request. I realised that we already have a module for bulk imports that was written to import records from hepdata.net
to a developer's local instance. Previously, we had a similar module for bulk migration of records from the old HepData site to the new hepdata.net
site. The importer
module bypasses the web interface of the normal submission system, so it would be a more efficient way of importing a large number of tarballs. If you could copy the tarballs to a web-accessible location and provide a list of INSPIRE IDs in a format similar to https://www.hepdata.net/search/ids?inspire_ids=true , I'll look into making the necessary changes to the importer
module. I've opened a new issue HEPData/hepdata#811 so please continue the discussion there as it no longer relates to hepdata_lib
.
Great - thank you!! 🙏
For cases where an analyser has data already in the YODA format for use with Rivet, it would be useful if
hepdata_lib
could read YODA files for conversion to the HEPData YAML format. It would be preferable if YODA was an optional and not mandatory dependence. The question of converting YODA to HEPData YAML has been a long-standing issue (HEPData/hepdata-converter#10), but it would be better handled byhepdata_lib
than thehepdata-converter
.Cc: @20DM