CCSDForge / episciences

An overlay journal platform
https://www.episciences.org/
GNU General Public License v3.0
26 stars 2 forks source link

Public API via XML, JSON, ... #37

Open t-wissmann opened 3 years ago

t-wissmann commented 3 years ago

Is there some kind of public API that can be used to query the database? I would be interested for example to query the list of volumes (with IDs and names). For lmcs.episciences.org, I wrote a script that checks that the order of the papers in a given volume is correct and that the DOI reported by episciences matches the DOI in the PDF.

So far, this script extracts all the required information from the HTML page reported by the webserver, but of course it would be preferable to directly obtain the list of papers in a given volume or the list of volumes in a more direct format (e.g. xml or json). Is this possible at the moment? Or do other kinds of more direct/REST-like queries to the episciences platform exist?

Edit: it is not necessary that the api is publicly callable; it would already help us if authenticated users could ask such queries.

rtournoy commented 3 years ago

Not at the moment but we have an internal API used for statistics at the moment, and the idea is to open it at some point in the future, and expand it's usage. Do you need to list/access only published papers, or also papers that are not yet published?

t-wissmann commented 3 years ago

For many tasks, having access to published papers (and their volumes) would already help a lot. For some other tasks, we also need access to not-yet published papers (namely for tools during the publication process). But it would already help to have API access to data that is already publicly available anyway (via html).

t-wissmann commented 2 years ago

I think the API we need at LMCS can easily be accomplished. For example, the following api function would help us a lot. Note that all the examples below are read-only, because they only provide the data that is currently present in HTML pages.

  1. Querying the list of all volumes, i.e. the information on the volumes page, in json format. This means that when querying

    https://lmcs.episciences.org/rest/list-volumes?regular=true&empty=false

    then it returns a json list of all regular non-empty volumes, together with their volume ids.

  2. Similarly,

    https://lmcs.episciences.org/rest/list-papers?status=16&volume=591

    should return a list of paper ids that were published (status=16) in the specified volume 591. If the user is logged in and has the right permissions, it should also be possible to query the list of papers with other status IDs, e.g. status=4 (accepted), which would be the information already present on the manage articles page.

  3. Paper metadata:

    https://lmcs.episciences.org/rest/papers-info?id=PAPER_ID

    should print the metadata (title, arxiv url, doi, authors, volume id, secondary volume ids, submission date, publication date,...) of the given paper that is listed in the HTML of the paper page https://lmcs.episciences.org/PAPER_ID. Of course, this request should only succeed if the paper is published.

  4. Paper administration data:

    https://lmcs.episciences.org/rest/administratepaper?id=PAPER_ID

    This should be the same as the above 'paper metadata', but it should work for unpublished paper and of course under the assumption that the logged in user has the required administration rights -- just like it is already the case with the administratepaper pages.

What do you think about those kind of queries? If you prefer, I can create separate issues for these four (and possible further) examples :-)

rtournoy commented 2 years ago

What if we use the same URLs with a different header to trigger a JSON content? e.g. : curl -H "Accept: application/json" "https://lmcs.episciences.org/browse/regularissues"

a3nm commented 2 years ago

@rtournoy I think this is also fine!

t-wissmann commented 2 years ago

What if we use the same URLs with a different header to trigger a JSON content? e.g. : curl -H "Accept: application/json" "https://lmcs.episciences.org/browse/regularissues"

This would be perfect for us!

rtournoy commented 2 years ago

About [1] "Querying the list of all volumes", can you please try the live examples:

curl -H "Accept: application/json" "https://epijinfo.episciences.org/browse/regularissues"

curl -H "Accept: application/json" "https://epijinfo.episciences.org/browse/volumes"

curl -H "Accept: application/json" "https://epijinfo.episciences.org/browse/section"

t-wissmann commented 2 years ago

Thanks a lot! the output looks great! I'm only wondering whether the inclusion of the list of all papers for each volume might cause too much load on the server.

rtournoy commented 2 years ago

About [2] we have added: Volumes (only for published articles):

curl -H "Accept: application/json" "https://epijinfo.episciences.org/volume/view/id/3"

and Sections (only for published articles):

curl -H "Accept: application/json" "https://epijinfo.episciences.org/section/view/id/3

To get the volume and all articles, with all statuses, you can use: curl -H "Accept: application/json" "https://rvcode.episciences.org/volume/all/?id=3" e.g.: curl -H "Accept: application/json" "https://epijinfo.episciences.org/volume/all/id/3 Authentication and a matching role are required.

rtournoy commented 2 years ago

About [3]: We have a new public export format: e.g.: https://epijinfo.episciences.org/54/json

t-wissmann commented 2 years ago

thanks! this looks very nice! I'm looking forward for it in production :)

rtournoy commented 2 years ago

OK you can try it online with v1.0.23 To be continued...

TobiasKappe commented 1 year ago

Hey! At the end of our discussion on Zoom a few weeks back I promised I'd get back to you with a couple of points where we still have to manually parse HTML in our automation. It took me a while, but I've finally gotten around to looking at what one of our tools (LMCSBot) does, and here's what I found:

Happy to discuss any of these further.