HeuristNetwork / heurist

Core development repository. gitHub: Vsn 6 (2020 - ), Vsn 5 (2018 - 2020), Vsn 4 (2014-2017). Sourceforge: Vsn 3 (2009-2013), Vsn 1 & 2 (2005-2009)
http://HeuristNetwork.org
GNU General Public License v3.0
55 stars 25 forks source link

RESTful API #144

Open michaelgfalk opened 2 years ago

michaelgfalk commented 2 years ago

The big questions

Status Quo

Retrieving data

Currently Heurist relies mostly on URL parameters to retrieve records and other data from Heurist databases. A typical call to the public API looks something like this:

    heurist server     version   data param   database param
           |              |           |              |
|---------------------/-------/--------------&---------------|
https://heuristref.net/heurist/?recID=12735&db=falk_playspace

My preferred solution would be to implement something like the following:

    heurist server     version    database   datatype  id
           |              |          |           |      | 
|---------------------/-------/--------------/------/-----|
https://heuristref.net/heurist/falk_playspace/record/12735

User interface

On the backend, Heurist is implemented as a single page application and does not provide routing to different menus in the interface.

How important is routing on the backend? To my mind it is unimportant. Users interact with the backend interface as a software application, the way they would us Google Docs or a desktop application. In these contexts, I don't think routing is very important.

CMS websites

On websites in the Heurist CMS, the website and the current page are both indicated by URL parameters. Heurist public sites are also implemented as single page applications, and do not provide full routing—though it is possible to obtain a persistent URL to individual pages on each site.

Here I think that a better form of routing can and should be implemented. Users should be provided a subdomain for each site on each Heurist server. This may not be possible for some servers, where there are insitutional restrictions on subdomains, but generally I think it could be implemented by creating a wildcard domain for each Heurist server and then parsing the subdomain elements in a standard form, e.g.

        database name  heurist server
              |              |
|---------------------.--------------/
https://falk_playspace.heuristref.net/

Apache would direct all requests with a subdomain to a PHP script that would parse the subdomain and retrieve the website for e.g. db == falk_playspace.

Ideally this would be combined with standard page routing as in other CMSs:

        database name  heurist server     slug
              |              |              |
|---------------------.--------------/------------/
https://falk_playspace.heuristref.net/project-team/
michaelgfalk commented 2 years ago

Heurist already uses some Symfony components (I think), so these pages could be relevant:

wiztigers commented 2 years ago

Hi @michaelgfalk , (and sorry I just saw this discussion)

I totally agree with the need for Heurist API to be more RESTful. To my knowledge, the two main drawbacks of the current implementation are : 1) too much reliance of url params, and 2) lack of proper authentication. These prevents effective usage from CLI/scripts, yet this, to me, seems essential for my future dev needs as long as for the future of Heurist (ie. extensions and third party devs).

I kind of agree with you that routing in the client is not very important right now ; but I feel it's rather more a consequence of the way Heurist GUI is implemented right now than something that isn't desirable per se. I thing that, on the contrary, it's something desirable. Not everybody wants to clicketey-clickey each time they open Heurist, but more than that, sharing specific records uris, showing off custom templates or vocabularies, and in general working as a team (at least remotely) kind of requires this kind of permurls. For exemple, currently you can share custom searches via urls. This is a must, very useful, and the kind of feature I'd like to see more.

And yeah, Heurist websites routes should be explicit. Urlparams are useful, but too much reliance on it should (IMO) avoided. Semantic urls is the way to go.

Proposed actions

If you don't mind, I'll share a proposal of what could be the next version of Heurist API. Probably a standalone project, which would just be a Swagger page, so we° can discuss stuff in practice, see the caveats, think about what can be made available and what shouldn't be, and so on. Then we° can say, for example, "we° discuss it until this date, so we° have the majority of questions answered, and then we° begin implementing it".

For various reasons, I should be able to do this next month (before start of november 2022, I hope —no guarantee).

° by "we" I mind "everyone who wanna participate". As its API is/will be a core component of Heurist, we really should have these discussions in public, with as many brains participating as possible. We're talking writing a specification, here. So, kudos to you for creating this issue instead of relying on mails. We don't have a structured "dev community" yet but —hey, we gotta start somewhere :smile_cat:

wiztigers commented 2 years ago

FWIW, just to be sure we're talking about the same thing, here is the original source of REST, Roy Fielding's thesis, with the 5 mandatory constraints (and a sixth optional one) of every REST interface.

And we already talked about the Richardson Maturity Model, so here is Martin Fowler's breakdown. Let's aim for level 3, baby ! (and we'll probably fall somewhere between levels 2 and 3 ... :cat2: )

wiztigers commented 2 years ago

Reminds me : I took some additional notes in this issue: huma-num/heurist#4.

yangli0516 commented 1 year ago

Hi @michaelgfalk, thanks for inviting me to this thread.

From the discussion, the followings are the features we would like to see in Heurist API:

wiztigers commented 1 year ago

Hi, I've begun specification of a future RESTful API for Heurist, which is available here. It's unfinished, for two reasons :

As you'll see, some RFC (Requests For Comments) are spread inside the documentation ; those are specific questions I'd like to have feedback about. The main issues I have now are about:

  1. user management which would not be on a "per database" basis ;
  2. true internationalization of Heurist GUI, as long as of databases (eg. 1 field, multiple labels for each language, one value for each language).

I'll advance on this subject until the end of the year, during which I should be able to answer any request / participate to any debate. And I'd like to begin implementation in january. Duration of this implementation phase could vary wildly depending of the answer I get to current and future RFCs, so I cannot give any deadline now.

@michaelgfalk, thanks in advance for any additional visibility you give to this initiative.

@yangli0516, to answer your specific requests :

wiztigers commented 1 year ago

ps. this is the link to the specification (and hopefully future documentation), and you can read it in SwaggerUI format if you prefer.
I don't, but SwaggerUI might be a more standard way of providing a simple interface to test the API once it will be implemented (@see RFC-SANDBOX for discussion and details).

ijohnson222 commented 1 year ago
  1. user management which would not be on a "per database" basis ;

If you have the idea of changing user management for Heurist, it would make MAJOR work for us. Heurist is designed for SIMPLICITY to make it as stable and maintainable long-term as possible. Part of that is that all databases are entirely independent and can be shuffled around, moved, cloned, renamed and deleted as much as you like. Any centralised system of authorisation imposes a whole new level of complexity.

2. If you plan something which sits on top of this and links users to their profiles in one or more databases, then fine, we don't need to change anything and you will simply need to write connectors and external identity management and manage tracking and synchronisations.

1. true internationalization of Heurist GUI, as long as of databases (eg. 1 field, multiple labels for each language, one value for each language).

2. We actually have such a system built into database structure, but not the way you conceive it. There is a table which can provide multiple translations of any record type def, field def, vocabulary, term, or value. We simply haven't implemented the code to use it. We've also added, within the last couple of months, a method for delivering multilingual texts according to context.

I'll advance on this subject until the end of the yearhttps://gitlab.huma-num.fr/heurist/api/-/milestones/1#tab-issues, during which I should be able to answer any request / participate to any debate.

And I'd like to begin implementation in january. Duration of this implementation phase could vary wildly depending of the answer I get to current and future RFCs, so I cannot give any deadline now.

@michaelgfalkhttps://github.com/michaelgfalk, thanks in advance for any additional visibility you give to this initiative.

@yangli0516https://github.com/yangli0516, to answer your specific requests :

* "read all terms from vocabularies" this is in my mind too, and will probably specified this week ; although it's linked to the problem of databases internationalization I just talked about.

* The important thing here is to read the concept IDs and their relationships (vocabs, hierarchies, referencing between vocabularies), and then to read a set of labels which can be applied according to language. The data is already available as XML output, but for the whole database not a specific vocab.

* "get record type information with field definitions" is already specified ; although I'd like to write a few use cases for it to be crystal clear, feel free to discuss this

This is already available, as for vocabs, but for the whole database not per record type.

* "read non-public records through authentication methods" I want this too, but ther's many questions around authentication I'd like the "Heurist dev community" to discuss about

Authentication is also something we are working on for the BnF

* "parsed temporal object" not sure what you're talking about. Personnaly I want that to be returned by the API in ISO formats, and not the human-readable but not really usable thingy we have now.

hmm, sorry, there is no such ISO format AFAIK that would handle the date specifications we permit. We are working on revising the internal 'thingy' (with which I have never been remotely happy, I didn't design it ...) to an XML representation following what draft standards are currently available. Also part of the BnF project.

      Cloning an avatar not requiring sustenance or sleep might work better ;-}

I reiterate the comment about simplicity. Unix style permissions (individuals, groups, RWX) have served us well for more than 1.5 decades and keep things simple and uncoupled across databases. I have to be convienced there is really a pressing use-case when there is so much else of pressing utility.

— Reply to this email directly, view it on GitHubhttps://github.com/HeuristNetwork/heurist/issues/144#issuecomment-1300306638, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AA7Z4AIJM5HD326TXIZD253WGJNXLANCNFSM57J3QN6Q. You are receiving this because you are subscribed to this thread.Message ID: @.***>

wiztigers commented 1 year ago

Hi @ijohnson222 , hope you're well, thanks for your answers

  1. (users & authentication) Of course, keeping the self-contained nature of databases but, in the future being able to say "in this specific Heurist server instance, user 3 of database db1 is the same person as user 26 of database db2" would be the best of both worlds, IMHO. We could implement a "real" authentication system (ie. token based and/or based on an identification provider, like 0Auth2, HumanId and so on) without straying off your vision, which suits me perfectly.
    So, one more time thanks, because just this answer is a huge help for me.
    My stance is to be, as you said, as simple of possible during implementation. Things will be complicated enough. However, I'd like to design the REST API in a way that allows us to provide users with new features in the future without breaking the REST interface.
    On the topic of ABAC, I think we share the same objective to be as flexible and as maintenance free as we can. But what I meant with my "i'd kill for ABAC" badly-formed pun, was that we could generalize UNIX-style permissions to more parts of Heurist than currently implemented.
    To give you examples:
    • some of "my" local Heurist users are a tad worried that you can compartiment records as much as you want, but you cannot do this for vocabs, which remain free pickings even for students who don't know what they do. I'm unsure of why vocabs don't have the same permission system as records.
    • another thing they'd like is to be able to define which workgroup can read/edit each field in a given datatype. The current mecanism of hidden/not hidden is not sufficient for some teams. In my mind, each base field could have an ownership similar to what currently exists for records. And each setting. And each vocabs|fields|types group. And so on. Thus, instead of users and groups, which is perfectly fine, you have the application of users and group to everything in Heurist. And one way of seeing this is a big matrix of "who has access to what" -it is in that sense that I talked about ABAC.
  2. (i18n) I'm looking forward to see your point of view.
    Just in case, please keep in mind the following points (which are just examples):
    • Differentiating languages is not sufficient, you should use locales at the very least (language+country) ; you're Australian, you surely know the differences between the different flavours of english, and even a snobby french guy like me wouldn't pretend that canadian french is the same that my french in every way
    • Just use UTF-8 everywhere ! And apply it to everything in Heurist, including sorting. During import, I cry each time my "État" field comes at the end of field lists. Just forget ISO.
    • Please keep in mind languages are very diverse, and the only sure way of solving this problem is to don't assume anything. For example, you see how datatypes have a name, plus a plural name? If not already done, maybe you can just ask @osmakov how russians form plural. As I understand it, the way plural works in Heurist doesn't even fully support russian-like languages. And don't even get me started on Asia ...
    • Everything could and should be internationalized: GUI labels, fields values, pictures, position of elements in the GUI, etc, etc, etc.
      In other words, I18N is a vast subject, the kind that should be not discussed alone. It's even more the case if you factor in A10N (accessibility) to support screen readers for example. Thus, I sincerely hope you're in touch with people (maybe at BNF) that really localized more than one professional software in multiple languages each time, all the while having legal constraints of multiple countries regarding accessibility to sugarcoat things... And if not, may I offer my help? Because this was exactly my life a few years ago, y'know, when I wore suits and crap ...
ijohnson222 commented 1 year ago

(users & authentication) Of course, keeping the self-contained nature of databases but, in the future being able to say "in this specific Heurist server instance, user 3 of database db1 is the same person as user 26 of database db2" would be the best of both worlds, IMHO. We could implement a "real" authentication system (ie. token based and/or based on an identification provider, like 0Auth2, HumanId and so on) without straying off your vision, which suits me perfectly.

Single sign-on is in the works for the BnF project

They pretty much do. The only thing a basic user can do is add a new term, and this is because we considered that there will nearly always be a need to do this. They cannot edit vocabularies. The owner can easily check what new terms have been added and merge them. It would be easy to change this if it was decided this is not the best approach, it does not require anything more complicated than the current system.

Already pretty much done, should be migrated at end of month. Fields can be marked as for owner only, for any logged in user, public, or individually hidden per record (designed for data which is in preparation). (i18n) I'm looking forward to see your point of view. Just in case, please keep in mind the following points (which are just examples):

That's fine, I didn't allow for localisation, one minor change to the translation table is required.

We do. The problem isn't UTF-8, it's the collation order. It can be changed per database, the problem is, as I remember, if we change it for French it stuffs up Greek ... So the solution may well be to add a collation setting.

I was well aware of this. It doesn;t even work in English. The S pluralisation is simply a convenience, it is editable. In any case, we don't use this value AFAIK.

I think you will have to do the roadbuilding! With very limited resources, full internationalisation must necessarily be placed on a long list of things we would like, and prioritised. I am planning a big prioritisation session in January when Artem will be in Sydney (assuming all goes well) but to be honest I can see an awful lot of stuff which is of much greater priority which will bring immediate substantial gains for less effort. It would be great if we could get some help with some of these, particualrly scripting.

This is a partial list of our roadmap in no particular order: much easier custom report writing, easier website customisation / better documentation, easier constructed titles, active help prompts, improved date recording (format, interface, rendering and searching), easier map construction, improved timelines, manually reordered vocabularies - oops, just did that, full PID redirection, significant documentation improvements, better database examples for website, template database structures, template filtering, better database purging scripts, code refactoring and removal of duplicates/unused, improved interaction logging, focus groups on interface, interface revisions, icons to replace complex menus, improved installation scripts, improved thematic mapping (prototype in process), scaling for big datasets, optimisation, WFS and WMS, map projections, migrate Relationship mapper function, additional remote lookup functions, auto-deposit into repository (in progress), searching/linking/uploading files to Nakala and other repositories (in progress), better git use and continuous integration, testing routines, single sign-on (in progress), cross-database record access, master-satellite databases with synchronisation, mint DOIs, auto-build database from CSV, TEI and other XSLT translation, fuzzy searches, improved network graphs (in progress), single sign-on (in progress), image optimisation, improve image management (in progress), improve/analysis of analytics and logging.

That's only a starter list And that doesn't include all the small incremental improvements to make the interface more usable, and routine bug fixing. And we have only 2.2 developers, who also need to generate their salaries by doing database conversions and websites ... You might reasonably conclude that it is hopeless, if it was not that it has always been this way! The land of DH is infinite ...

— Reply to this email directly, view it on GitHubhttps://github.com/HeuristNetwork/heurist/issues/144#issuecomment-1300800785, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AA7Z4APGEHHXEEIYEWW4H4DWGKHO7ANCNFSM57J3QN6Q. You are receiving this because you were mentioned.Message ID: @.***>

wiztigers commented 1 year ago

Very interesting. Briefly cuz I don't have much time:

Thanks a lot for the roadmap.
There are many of these items that are of special interest to the Alsatian / French public, but hell could be in the details of how we envision things. We really have stuff to talk about together, to maybe prioritize what we agree about (which I'm sure will be the majority). As soon as REST API is done, I'll probably steal items in your todo list nyahaha 😼

wiztigers commented 1 year ago

IMO lots of groundwork (API, testing, specifications) is necessary ...

I might add that I'm talking about specifications, not exactly (user) documentation. The difference is the former explain how things should work, not how they currently work-well-kind-of-if-we-retro-engineer-things-but-we-don't-really-know-in-the-end. Me, as an "outsider dev", need to know what is a bug, what is a feature, and what in my plans does align with the core team future aims. And for this Ian, I personally need your vision on features, and at best Artem's vision on architecture, so these general visions can be put on paper (by me, if you don't have time) and pave the way to future features and fixes.

That's why even just this roadmap is invaluable for me.
If you could add maybe 1-10 lines of explanation to each (because some of these I don't understand), it would be a more solid base to begin:

  1. prioritization (once we understand why a feature would exist, each one of us can know if we want it or not)
  2. specification (a design doc is important to explain how the feature ought to work, and the implication)
  3. planning (once we precisely know what must be done, we can count how much time it would cost)
ijohnson222 commented 1 year ago

      It is implemented (h6-alpha on Huma-Num, expect it in standard version by end of month)

 *   [cid:90581fd0-a723-44db-ab67-4bcf69674675]

*

            [cid:d2f46c86-ebed-4b7f-b0f8-3e7eb26d2bdb]        I asked for a checkbox with a label under the field, this was Artem's idea which is excellent but a bit too discrete - it is too easy to miss that you have hidden something - so I have asked him to simply enable this for all public fields and show the hidden fields in a more obvious way (changing the background colour). I've also asked to put the eye icon on all fields - why not? - and change per record visibility to per record visibility checkbox and in that case it would also show a checkbox like this:

         [https://lh3.googleusercontent.com/N5u0QXFF9d71VAMIaZO0DozPvhu_BNJQi6Lfk_qNvjzpA_6LMVGzwFgd8oVzBSyloHmH0a-upFx_I7OSJjyTDwAi8FKVrztXnREAwXIFj5VI9iZyNTOZ5a0zs60oATRth_P-9kD4voBNnrKlrhxDQ2tq10cakYmt_snXl9fWckRlMB2NMTePwdW5jOkWOQ]

​There will be a proper roadmap published in ?February, but in the meantime these are the things we've done in the last couple of years:

https://docs.google.com/document/d/1YAAcpsQFa1S6t367xEWiemzAD9B4v9IGrI46h6TDnD0/edit#heading=h.xd330usfq44r

Ian Johnson | Honorary Associate

The University of Sydney Faculty of Arts and Social Sciences Rm 445, Old Teachers College A22 | The University of Sydney | NSW | 2006 35, rue des Abbesses, Paris 75018 Mob: +33 6 95 34 14 66 E @.> @*.**@*.***> | HeuristNetwork.orghttp://heuristnetwork.org/ | http://usyd.academia.edu.au/IanJohnson sydney.academia.edu/Johnsonhttp://sydney.academia.edu/Johnson | sydney.edu.auhttp://sydney.edu.au/

Heurist user support: Dr Michael Falk @.) Heurist development: Artem Osmakov @.)

CRICOS 00026A. This email plus any attachments to it are confidential. Any unauthorised use is strictly prohibited.

If you receive this email in error, please delete it and any attachments.


From: Régis Witz @.> Sent: Friday, 4 November 2022 9:05 AM To: HeuristNetwork/heurist @.> Cc: Ian Johnson @.>; Mention @.> Subject: Re: [HeuristNetwork/heurist] RESTful API (Issue #144)

Very interesting. Briefly cuz I don't have much time:

Thanks a lot for the roadmap. There are many itemps that are of special interest to the Alsatian / French public, but hell could be in the details of how we envision things. We really have stuff to talk about together, to maybe prioritize what we agree about (which I'm sure will be the majority).

— Reply to this email directly, view it on GitHubhttps://github.com/HeuristNetwork/heurist/issues/144#issuecomment-1303094097, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AA7Z4AMMRRSWUDGXIUA2JVDWGS7UZANCNFSM57J3QN6Q. You are receiving this because you were mentioned.Message ID: @.***>