Open GraemeWatt opened 3 months ago
Thanks for this Graeme. I put the list of inspire IDs here and you'll find the 780 tarballs in the same directory. They all have a name of the form ins123456.tar.gz
.
@20DM : thanks, that's great! I'll look into modifying the importer
module soon.
I picked a random submission (ins2705058.tar.gz
) and uploaded it to my Sandbox. Few (optional) comments for your consideration:
http://rivet.hepforge.org/analyses#BESIII_2023_I2705058
as an additional resource. This is not strictly necessary (see submission docs) since the link will automatically be added after the record is finalised from the nightly harvesting of the analyses.json
file. Moreover, the automatic link added will be http://rivet.hepforge.org/analyses/BESIII_2023_I2705058
with a /
instead of a #
. So if you want to keep the Rivet analysis in the submission.yaml
file, better to use a link with a /
instead of a #
, or just remove it completely.comment
has a weird markup that is not rendered by HEPData. It looks like you are taking this from the journal abstract given by the INSPIRE record (JSON). The INSPIRE JSON also provides the arXiv abstract (second item of abstracts
) that uses LaTeX markup and can be rendered by HEPData. HEPData uses the arXiv abstract from INSPIRE if possible (code). Since HEPData already stores the paper abstract (although it is only displayed if there is no comment
), I don't think you need to duplicate it in the comment
. So I would just use the additional information "NUMERICAL VALUES HAVE BEEN DIGITISED FROM THE PAPER." as the comment
or omit the comment
completely if there is nothing to add. (Another possibility is to use the Description
from the Rivet .info
file as the comment
, but in this case it contains Beam energy must be specified as analysis option "ENERGY" when rivet-merging samples.
which is not relevant to the HEPData record.)Thanks for the feedback, Graeme!
Re 1: Ah good point, yes it should be /
. I found the version with the #
is an existing HepData yaml somewhere but do not recall which one it was now. I'll remove it then altogether, seeing as you run the nightly anyway.
Re 2: Ouff, yeah that doesn't look great. Apologies for that! I didn't realise there were occasionally two abstracts - it looks like most of the Inspire IDs I've got on my list only have one in fact. I've tweaked the logic now to take the arXiv one if it's available and fall back to the Inspire one otherwise. I was already falling back to the description from the Rivet info file in the few cases (~5) where no abstract is available from Inspire. I think we definitely want to add some kind of caveat sentence to highlight that the values are digitised from the paper (or come from Rivet or whatever - happy to tweak the wording!) in order to make it clear that they weren't provided directly by the experiment. However, I wouldn't want that single sentence to suppress the abstract, which I find useful to have personally, so perhaps the duplication of the abstract in the comment is acceptable? In any case, I've replaced the tarballs with new versions using the arXiv abstract where available.
Re 3: So I actually started doing this at first, but then quickly realised that it would require rewriting several hundreds of the routines: Many of them currently "abuse" the x- and y-axis integers in the identifier to group distributions - but not necessarily in the intended way. For instance, there are cases where one would need to turn existing y
groups into separate d
instances because the independent axis was actually different between them, and then the x
groups would need to be turned into y
groups, since HepData doesn't really have non-unit x
identifiers. The if
-else
branching in my script got a little out of hand very quickly and since I didn't really fancy re-writing and re-validating hundreds of routines, I figured it'd be easier to leave the routines as they are, and to just relabel them on-the-fly using the custom Rivet identifier instead, which comes in very handy here. Hope that's OK?
Hi Graeme, just to ping this - is there anything I can help with?
Thanks for making the changes to the tarballs. I haven't started looking at this yet, since I didn't see that it was particularly urgent, but I'll try to look into it within the next couple of months.
The
importer
module (CLI) was written to import records from hepdata.net to a developer's local instance. It uses a list of INSPIRE IDs given athttps://www.hepdata.net/search/ids?inspire_ids=true
and it downloads files using a URL patternurl = "{0}/download/submission/ins{1}/original".format(base_url, inspire_id)
wherebase_url = 'https://hepdata.net'
.The
importer
module should be extended to get the list of INSPIRE IDs and the download files from an alternate location, for example, a simple web directory with the INSPIRE IDs contained in the name of the files. It should also be possible to create records with any user assigned as the Coordinator (rather than justadmin_user_id = 1
). The ability to import only a subset of the complete list of INSPIRE IDs would be useful.These changes should be carefully tested locally and on the QA system before importing to the production instance. Such an extension would be a quicker way of importing the 780 records obtained from Rivet than using the normal submission web interface.
See also discussion with @20DM in HEPData/hepdata_lib#229.
A list of the Rivet analyses can be seen at https://gitlab.com/hepcedar/rivet/-/issues/485 .