civicrm / org.civicrm.api4

CiviCRM api version 4
Other
8 stars 19 forks source link

Design Principles: add "Business/Process orientated"? #14

Open ErichBSchulz opened 7 years ago

ErichBSchulz commented 7 years ago

Sorry this is another hairy subjective one. But posting this get some guidance on approach and philosophy.

The current code base is essentially a CRUD wrapper around the BAOs. Is this what we see as most critical in V4? if not what is the shape of the business operations that the API needs to support and what is the pattern?

background

"Avoid designing a REST interface that mirrors or depends on the internal structure of the data that it exposes. REST is about more than implementing simple CRUD (Create, Retrieve, Update, Delete) operations over separate tables in a relational database. The purpose of REST is to map business entities and the operations that an application can perform on these entities to the physical implementation of these entities, but a client should not be exposed to these physical details." "Avoid introducing dependencies between the web API to the structure, type, or location of the underlying data sources. For example, if your data is located in a relational database, the web API does not need to expose each table as a collection of resources. Think of the web API as an abstraction of the database, and if necessary introduce a mapping layer between the database and the web API. In this way, if the design or implementation of the database changes (for example, you move from a relational database containing a collection of normalized tables to a denormalized NoSQL storage system such as a document database) client applications are insulated from these changes." ref

(edit) some concrete examples:

(/edit)

ErichBSchulz commented 7 years ago

This is a relevant note from @nganivet:

to reinforce that IMO the API should without a shred of doubt be business-focused (rather than a database-access layer). As such, it's entities should be business entities, not database tables, and I could not subscribe more to footnote #3 in the Google Doc (see above) on REST interface design.

Very concrete use case today: a large customer of mine has non D/WP/J! website, and a separate CiviCRM instance. He wants to implement event registration on his website - trivial to do via APIs since all the events are free. His question: if we use the API to create event registration in the CRM, is all the business logic applied? For example, if using the (Participant, create) call and the event is over capacity, is the person automatically put on the waiting list? Or will the API call result in the event being over-booked?

We've said for years that there is too much business logic in the forms layer and that this was a huge mistake. As we're moving to Angular/client-side interfaces for CiviCRM, we've also said that these interfaces should only use the API to remotely interact with CiviCRM. So the question is ... are we going to do the same mistake: code all business logic in Angular and use the API as a remote-database layer? Or are we going to do things right, have Angular call a business-level API and have all business logic done in core (BAO layer or equivalent)?

If business-oriented, it means we have to a) get rid of a number of 'intermediary' entities that serve only as join tables or implementation details and b) add 'business' action verbs, that are entity-specific, on top of the CRUD actions

eileenmcnaughton commented 7 years ago

So, in general I think the business logic for creating or updating any entity should sit in the BAO layer & the api should be a thin wrapper around that.

Currently there is still too much in the form layer. We have a slow laborious process for addressing that - which basically means tackling one bit of code at a time, moving logic from the form layer to the BAO layer & adding unit tests.

I don't think you will find any silver bullet to make that process less painful - although perhaps it might be possible at some point to replace a substantial part of the form code with a call to the Order.create api.

My personal belief is that if you try to replace the create /update / delete model in this iteration of the api you will increase the scope beyond what is achievable.

Our api does support other business actions and the maturing of these actions may eventually replace the CRUD api - for example contribution supports the heavily used repeattransaction & completetransaction entities - but I wouldn't try to bite off all of that straight off the bat.The process for adding these ad hoc entitiy actions is basically to come up with a proposal, to distribute & discuss it, respond to feedback & if there is a mandate to implement and document it.

totten commented 7 years ago

IMHO, the question is too abstract. It's more fruitful to start from example problems in APIv3; then describe examples that would be better in a new iteration; and then argue implementation.

I don't think you will find any silver bullet to make that process less painful - although perhaps it might be possible at some point to replace a substantial part of the form code with a call to the Order.create api.

+1 for this

ErichBSchulz commented 7 years ago

thanks folk! Just to clarify I'm seeking guidance on a potential patch to the readme

@eileenmcnaughton this comment helped me align my chakras "Currently there is still too much in the form layer. We have a slow laborious process for addressing that - which basically means tackling one bit of code at a time, moving logic from the form layer to the BAO layer & adding unit tests." - I'm going to do a PR on the dev guide to capture that.

@totten - I'll try to clarify - maybe we need both "plumbing and porcelain" API...

It seems to me that "plumbing" CRUD stuff is OK to leave in the BAO - but the API should really cover porcelain. (edit) should really expose the porcelain within the BAO (/edit) The concern with plumbing CRUD commands is they are dangerous and can leave the database in an ambiguous or invalid state (eg hanging contriubtions)...

I guess this raises another issue... if the API can mess with the database then somehow "this is an unsafe (because it is "plumbing") operation" is a fact that should be apparent to coders using the API??

eileenmcnaughton commented 7 years ago

I think the way in which the api covers the porcelain is by calling the BAO create & delete methods. For example Contribution.create api does a bunch of other actions. By comparison we do not expose any cache tables through the api.

There are some tables where exposing the create action could be arguable. In general my position on this is that we know those actions are being used by someone somewhere (possibly just the test suite) or they would not exist. I feel it is time-consuming & low return to decide which ones we might reject bringing over as create or delete options - however, I would support 'marking' them in some way.

For many of them a new action will remove the need to use them over time - but I think trying to separate them out at this stage will hugely increase the project scope

nganivet commented 7 years ago

I think we should not expose any of the 'join' and 'implementation' tables to the outside. Examples of such tables exposed through API v3 today are:

ActivityContact: this is a 'join' table used to implement a many-many relationship. We should get all information on contacts related to an activity when using a get on the Activity, just as we get email addresses by querying the Contact entity. API v4 should therefore be able to deal with many-many relationships in a standard way, and we should forgo all other similar join tables.

EntityTag: this is an 'implementation' table, and refers to the way we decided to implement tags. This could change in the future for performance or other reasons. Tagging an entity is a business action performed on the entity, so we should have a 'tag' action on all entities that are tag-able, and return the list of tags when doing a get on such Entity.

GroupContact: same as above, if not even worse as exposing this 'implementation' table leads to inconsistencies: I can query members of a group through this 'Entity', except if this is a Smart group as I then have to go through the Contact entity. And it most probably will not work either with future group types (ie. crazy antipodean QIF thing) or caching mechanisms (ie. NoSQL-based?) we might invent. We get such inconsistencies as a result of using the API for the plumbing, rather than for the porcelain.

EntityFinancialTrxn: We are really calling for trouble by exposing such low-level entity through the API. It should not even be available in core, except to a very limited set of BAOs.

MailingEventConfirm, MailingEventSubscribe, MailingEventUnsubscribe, MailingEventResubscribe: again these are implementation tables. We should only have one entity: MailingEvent that eventually has different attributes based on the type of event.

This is just a sample, there are many other join/implementation tables exposed. By getting rid of these in favor of actions on corresponding business entities we will not only greatly simplify the API and it's use by third parties, but also ensure consistency and our ability to evolve CiviCRM without impacting the APIs - so in essence keep our contracts valid even though the underlying technology evolves (see this video for a great depiction).

eileenmcnaughton commented 7 years ago

I guess my main caution here is that analysing entity by entity what to do massively increases the scope of updating the api & of adopting the api.

It's tricky. If you were to resolve the issue in a way that meant api has less functionality or a lot of code analysis is required to adopt api v4 then it will be hard to get any adoption.

I think you need to look for non-contentious areas to start on & find a way to make progress. I agree with discussing & planning - but I worry that we have already spent much more time discussing the case that Nicolas just made (over quite a while) than it would take to get api v4 shipped - so I would suggest you start with things everyone agrees on, spend some time with the code & see what issues you really encounter & really seem to matter once you get started.

nganivet commented 7 years ago

Understood, but I would hate that people start implementing APIs for all database tables as in API v3 and that we again have to live with/support all this legacy - this is after all the reason we're now moving to API v4! It seems a little bit of analyzing before people start coding will save us a lot of time further down the road. I suggest we start with simple functional domains, like groups/tags/mailing events/... and specify how these would be implemented in 'business terms', decoupled from the database. This seems not too far from your own thinking, but not quite sure ...

eileenmcnaughton commented 7 years ago

Well - the things that are causing us to move to api v4 are not the entities you mention - they might be a trap for young players at times but it is the complex api that have coded us into a corner and create large amounts of ongoing maintenance work. As we rely on the query object for the key contact, contribution & participant api calls we have a bizarre set of things that do & don't work. Our default return properties for these include things that are not performant - especially on the contribution.get api and we have this incredible fragility due to all the per field hard-coding in the query object.

I have spent around 30 hours over the last 2 months on tweaking the contribution api / search to improve performance (not finished) but the pain is all because I'm trying to deal with unpredictable code - much of it locked in by the v3 api, but other parts by the various search forms.

So, I don't mind adding additional 'business api calls' - I think these are by their nature highly individualised & ad hoc - i.e. there is a project to come up with a business approach to groups, one for tags, one for mailing events. I think these should be individual and should not block each other, or us getting rid of our major code pain point.

ErichBSchulz commented 7 years ago

Heya @eileenmcnaughton - I'll post more substantive thoughts later after I've fully digested the above - but just before I get dragged out for dinner - I'm just checking you had a quick look at BOVTAC vs MABA - I'm thinking the project interdependence maybe at the root of some stress we can shed by disentangling the two (just a theory) - the project interdependance didn't really seem to fit in the API4 issue queue so I split it out.

xurizaemon commented 7 years ago

I've been following this conversation (I thought!).

This would make a lot more sense to me if BOVTAC was unpacked with more than the linked insight comment in chat.civicrm.org ... Can either of you fill in the blanks to help me catch up? I'm sure I'd see the point with a couple examples.

"Stop supporting really really old stuff - like ... because ..., and ... because ..."

(Forgive me if this should be clear, but it's not to me)

ErichBSchulz commented 7 years ago

ha! @xurizaemon I'm now a bit confused too following on from @eileenmcnaughton's latest comments. I thought she was saying "We need to BOVTAC in order to move on" but tonight she said "dont break our api contract... we will lose a bunch of our customer base" - so I'm confused too. But I'd like (if possible) this thread to focus on the question "should API4 focus on entity CRUD, or should it attempt to support business events and actions. I think the BOVTAC vs MABA discussion is really critical but I'd like to address that question on the discussion whiteboard if that is OK?

@eileenmcnaughton many of the problems you discuss seem to be symptomatic of "failure to separate concerns" and "failure to make side-effects explicit" #19 and #18 seem relevant for performance? I'm going to make "no occult side-effects" a separate issue (ie clear distinction between porcelain and plumbing calls)!

@eileenmcnaughton I really like this too "I would suggest you start with things everyone agrees on, spend some time with the code & see what issues you really encounter & really seem to matter once you get started."

I appreciate the concerns re scope... basic CRUD seems like is should be easy to implement - the challenge is stopping the porting of rubbish from V3 right? (yet more cases "white listing safe CRUD operations" and "identifying high priority porcailain API calls)

nganivet commented 7 years ago

@eileen - thanks for your detailed explanation, much clearer now.

So maybe the various searches should be the first thing we focus on. We could start with easier ones, such as Find Contributions or Find Memberships, and then tackle the Advanced search screen when we feel ready. Having these searches use the API rather than the query objects would be a significant proof point for API v4, and does not seem overly complicated (from a very remote point of view ...)

eileenmcnaughton commented 7 years ago

@xurizaemon BOVTAC - is the idea that we can break our api v3 contract if we give people warning - ie. tell people now that we are end-of-lifing the existing v3 api contract & replacing it in a new version of CiviCRM with one that works mostly the same but a bit different in places. That would mean we have to version increment CiviCRM to 5.0 and would transfer the burden of work from us to the organisations & extension maintainers. To be clear I think this is more work - but I can see the appeal of dumping the work on other people, even if I think it would be deadly for our product.

There are a bunch of things that CAN be done within api v3 - e.g there is nothing about api v3 that precludes adding caching - I think it would be more work to do it in api v3 than it would be to develop it in api v4, but it is possible in api v3 - especially for the standardised api calls (like address.get).

@nganivet "start with easier ones, such as Find Contributions or Find Memberships" - maybe I'm not sure - they are currently re-used from advanced search but perhaps separating them off is an option and would make it easier. I hadn't considered that before.

totten commented 7 years ago

So, I don't mind adding additional 'business api calls' - I think these are by their nature highly individualised & ad hoc - i.e. there is a project to come up with a business approach to groups, one for tags, one for mailing events. I think these should be individual and should not block each other, or us getting rid of our major code pain point.

+1 for this. Most of the items in Nicolas's list of examples (ActivityContact, GroupContact, EntityFinancialTrxn, MailingEventConfirm) raise very different issues for me. I think we can find common ground on a lot of those -- but the discussions about each are pretty however.

However, I think Nicolas keeps pointing this stuff out because he perceives some trend/theme around relations, I agree that there are some trends, but I'd focus on a different list of entities for discussing the trend: ActivityContact, EntityTag, Phone, Email, and Address are in similar positions. Each one is normally considered part of some parent entity (Activity or Contact), so you might expect to access them through the general Activity or Contact API. However, each one is really a relation -- relations are different from fields,. Each one has valid use-cases for exposing a robust set of functionality for those relations (i.e. filtering/sorting/adding/replacing/deleting, individually and/or collectively).

Activity / ActivityContact is probably a decent case-study.

Up until Civi ~v4.3, the activity model stored contact relations in these places:

At that time, APIv3 provided a singular Activity API with special options designed for the source/assignee/target (e.g. return.assignee_contact_id). These were specially designed for Activity API, but they had multiple criticisms:

In ~v4.3, someone (Lobo/Yashi) consolidated the Activity relations to just civicrm_activty_contact.contact_id. And then someone else (Capo) realized there was a good cost-benefit to adding an API for the ActivityContact relation -- a small amount of boilerplate code yielded a more thorough / discoverable / predictable API. They didn't have to dig through weird code-paths of Activity API/BAO/utils. Moreover, with chaining, you can still embed the relations within the Activity API. Who wants to say "no" to an easy-win/low-hanging fruit?

Of course, this doesn't mean ActivityContact is awesome. Using record_type_id instead of a machine-name seems pretty wonky. To handle a full range of common scenarios, you need to understand chaining/replace/relations in APIv3.

The case-study of activity schema is interesting because it's a real transition -- we can test some theories about what's easy/hard to support during change. How would this transition have worked differently if APIv3 had exclusively used (a) adhoc-businessy-pseudorelational fields vs (b) dummy passthru relational-entities?

In fact, we have concrete experience of doing both those -- e.g. Activity actually did have adhoc pseudorelational fields, and ActivityType actually is a shim for a different table/data-model. My observation: Those two are about equally difficult to implement.

The real drivers of difficulty are not about adhoc-fields or relational-entities. The real issues are:

However, my fear is that "some kind of starting point [for relations]" will be misinterpreted as "dummy-passthu for everything". AFAIK, nobody has ever wanted or advocated "dummy-passthru for everything." I think folks are pretty open-minded about what exactly that start-point looks like, but talking about that in theoretical terms feels like overcommunicating. It's easier to talk about that using concrete examples of API calls.