Open t-wissmann opened 3 years ago
Not at the moment but we have an internal API used for statistics at the moment, and the idea is to open it at some point in the future, and expand it's usage. Do you need to list/access only published papers, or also papers that are not yet published?
For many tasks, having access to published papers (and their volumes) would already help a lot. For some other tasks, we also need access to not-yet published papers (namely for tools during the publication process). But it would already help to have API access to data that is already publicly available anyway (via html).
I think the API we need at LMCS can easily be accomplished. For example, the following api function would help us a lot. Note that all the examples below are read-only, because they only provide the data that is currently present in HTML pages.
Querying the list of all volumes, i.e. the information on the volumes page, in json format. This means that when querying
https://lmcs.episciences.org/rest/list-volumes?regular=true&empty=false
then it returns a json list of all regular non-empty volumes, together with their volume ids.
Similarly,
https://lmcs.episciences.org/rest/list-papers?status=16&volume=591
should return a list of paper ids that were published (status=16) in the specified volume 591. If the user is logged in and has the right permissions, it should also be possible to query the list of papers with other status IDs, e.g. status=4 (accepted), which would be the information already present on the manage articles page.
Paper metadata:
https://lmcs.episciences.org/rest/papers-info?id=PAPER_ID
should print the metadata (title, arxiv url, doi, authors, volume id, secondary volume ids, submission date, publication date,...) of the given paper that is listed in the HTML of the paper page https://lmcs.episciences.org/PAPER_ID
. Of course, this request should only succeed if the paper is published.
Paper administration data:
https://lmcs.episciences.org/rest/administratepaper?id=PAPER_ID
This should be the same as the above 'paper metadata', but it should work for unpublished paper and of course under the assumption that the logged in user has the required administration rights -- just like it is already the case with the administratepaper pages.
What do you think about those kind of queries? If you prefer, I can create separate issues for these four (and possible further) examples :-)
What if we use the same URLs with a different header to trigger a JSON content?
e.g. :
curl -H "Accept: application/json" "https://lmcs.episciences.org/browse/regularissues"
@rtournoy I think this is also fine!
What if we use the same URLs with a different header to trigger a JSON content? e.g. :
curl -H "Accept: application/json" "https://lmcs.episciences.org/browse/regularissues"
This would be perfect for us!
About [1] "Querying the list of all volumes", can you please try the live examples:
curl -H "Accept: application/json" "https://epijinfo.episciences.org/browse/regularissues"
curl -H "Accept: application/json" "https://epijinfo.episciences.org/browse/volumes"
curl -H "Accept: application/json" "https://epijinfo.episciences.org/browse/section"
Thanks a lot! the output looks great! I'm only wondering whether the inclusion of the list of all papers for each volume might cause too much load on the server.
About [2] we have added: Volumes (only for published articles):
curl -H "Accept: application/json" "https://epijinfo.episciences.org/volume/view/id/3"
and Sections (only for published articles):
curl -H "Accept: application/json" "https://epijinfo.episciences.org/section/view/id/3
To get the volume and all articles, with all statuses, you can use:
curl -H "Accept: application/json" "https://rvcode.episciences.org/volume/all/?id=3"
e.g.:
curl -H "Accept: application/json" "https://epijinfo.episciences.org/volume/all/id/3
Authentication and a matching role are required.
About [3]: We have a new public export format: e.g.: https://epijinfo.episciences.org/54/json
thanks! this looks very nice! I'm looking forward for it in production :)
Hey! At the end of our discussion on Zoom a few weeks back I promised I'd get back to you with a couple of points where we still have to manually parse HTML in our automation. It took me a while, but I've finally gotten around to looking at what one of our tools (LMCSBot) does, and here's what I found:
administratepaper/list
to get a list of articles of certain statuses. This works fairly well, except that we then also have to parse the HTML that is embedded inside this JSON to get the raw information out. Perhaps this endpoint could change to just return pure JSON, or maybe a different one can be made for this purpose?administratepaper/list
, but is found in the <paper_id>/json
interface that was added in January. This means that, in some cases, we need to issue two requests per article. Would it be possible to also expose these two fields in the administratepaper/list
interface, or its full JSON successor?Happy to discuss any of these further.
Is there some kind of public API that can be used to query the database? I would be interested for example to query the list of volumes (with IDs and names). For lmcs.episciences.org, I wrote a script that checks that the order of the papers in a given volume is correct and that the DOI reported by episciences matches the DOI in the PDF.
So far, this script extracts all the required information from the HTML page reported by the webserver, but of course it would be preferable to directly obtain the list of papers in a given volume or the list of volumes in a more direct format (e.g. xml or json). Is this possible at the moment? Or do other kinds of more direct/REST-like queries to the episciences platform exist?
Edit: it is not necessary that the api is publicly callable; it would already help us if authenticated users could ask such queries.