Remove hardcoded id column in favor of reading from schema

venkatd commented 1 month ago

Hi, I had an issue where I tried using this library for a column that has a composite non-standard primary key.

https://github.com/cheerfulstoic/ecto_watch/blob/dbe17a5a62ae5c6726a0fe3824b271dac80cf0e8/lib/ecto_watch/watcher_server.ex#L83

Above you are hardcoding to just a column named id.

I would suggest using ModelName.__schema__(:primary_key) to get the list of columns to return.

Currently the API has an id as part of the tuple as so: {:inserted, MySchemaName, id, extra}

I think it might be enough to reduce it to 3 tuples: {:inserted, MySchemaName, data}.

That data column would contain all of the information needed including an id.

Thoughts?

cheerfulstoic commented 1 month ago

Yes! I thought about not hard-coding the ID, but then forgot about it as I was trying to release a first version. I'll look at adding this in the morning!

As for three-element tuples instead of four, I actually thought about that quite a bit when creating ecto_watch in the first place. I don't 100% remember my reasoning (I should write it down in the README, perhaps, if I can think it through). But I know I wanted to emphasize that the primary key would always be returned. I guess also I want to emphasize that the fourth element should be used less often (like an opts optional argument, for example) because you should avoid returning extra data whenever possible because of the limitations of pg_notify. But I'm still happy to have the discussion 🤷

venkatd commented 1 month ago

@cheerfulstoic

I think there are a few issues with forcing row.id

Some tables may not have primary keys and only unique indexes
Some tables may have composite primary keys (I am currently dealing with this)
Sometimes the column name is not called id

What I would propose is:

Having a 3-element tuple like {:inserted, state.schema_mod_or_label, data} where data is a subset of the fields returned
Default data to select just the primary keys using state.schema_mod.__schema__(:primary_key)
extra_columns can add to the list of columns to return

I agree that we don't want to return a bunch of extra data each time due to the payload limits.

I tried generalizing here and it seems like most of the code does need to care about a primary key: https://github.com/stanfordtax/ecto_watch/blob/generalize-schema-fields/lib/ecto_watch/watcher_server.ex

It was a bit rushed (I'm up against a deadline!) but may give you a rough idea of what I had in mind :)

Once I have some breathing room happy to issue a PR if this is interesting to you. Otherwise that branch above meets my current needs and can use that as inspiration.

There are some additional considerations we would have to deal with such as how to generate the pg_notify channel name and the PubSub name. For now I just sort alphabetically and join with : to generate the name.

cheerfulstoic commented 1 month ago

Yeah, I 100% agree that we shouldn't be limited to a hard-coded id. Sorry, I don't think I made myself very clear 😅

The only thing I was uncertain about was if there should be a three element tuple (with type/schema_mod_or_label/primary_key_value/extra_data) vs a four element tuple (with type/schema_mod_or_label/all_data)

I've already made a draft PR to address using the schema's primary key instead of just id (tests are intermittently broken and I'm trying to figure out why). We should address the tuple thing separately because that would be a breaking change 👍

venkatd commented 1 month ago

Great!

I do feel like a 3 element tuple is simpler to understand. Because what if we have non standard keys or composite primary keys, what would the id column be then? What would be returned?

In the fork, I have a single property in the payload that contains the relevant subset: https://github.com/stanfordtax/ecto_watch/blob/846e7c46f2d57f6467d505419272b29c025d0349/lib/ecto_watch/watcher_sql.ex#L31

One thing I've settled on is having 3 distinct column lists:

The trigger columns - (when to trigger a pg notification - you already have this)
The return columns (what to return in the notification payload - this defaults to the primary key)
The subscribe columns (how to partition the subscription if we want to subscribe to a specific topic - also defaults to the primary key as before)

All 3 of these are independently configurable and it defaults to something sensible when omitted. It doesn't care what is and isn't a primary key.

The way I support multiple columns is I construct the unique_label in the format: label_or_mod:field1:field2 where the fields are sorted by the column names alphabetically.

In our our feature:

I am triggering a key or value changes in the form_fields table (think of it like a key-value table of form fields)
I am returning the key, value, and document_id in the payload
I partition by document id - meaning I want to be able to listen to events by document_id and relays those to the frontend

It still allows for the simple case as you already have: I want to be notified if anything changes for a row and just tell me the id that changed.

P.S.

I've been making a lot of changes to a fork of the library (move fast and break things while I am prototyping a feature for our product). I don't plan to maintain the fork except for our own internal use but would be happy to discuss getting any potential improvements into the main ecto_watch! If it's helpful to you I could create some issues based on what I believe may be improvements and you can decide whether it makes sense to incorporate based on your design goals.

I'm on the road for ~2 weeks but happy to jump on a call when I'm back if you want to bounce ideas.

venkatd commented 1 month ago

I updated the tests to give you an idea of the API I currently have: https://github.com/stanfordtax/ecto_watch/blob/846e7c46f2d57f6467d505419272b29c025d0349/test/ecto_watch_test.exs#L486

cheerfulstoic commented 1 month ago

Of course fine to have a fork to experiment with 😉

I've just released 0.5.3 of ecto_watch with support for primary keys other than id.

You mentioned listening to only events by document_id, which sounds similar to this issue, honestly:

https://github.com/cheerfulstoic/ecto_watch/issues/6

If you have the same issue then probably it's something to prioritize!

In that issue there are also some somewhat fundamental questions about the API, so I'm going to go off with both the question about tuples and an idea about subscribing to values on things other than the primary key and see what I come up with 👍

venkatd commented 1 month ago

Great!

The latest on my branch supports arbitrary columns if you are interested in taking a look. For the record, here is how the public API looks at the moment:

      {EctoWatch,
       repo: Codex.Repo,
       pub_sub: Codex.PubSub,
       watchers: [
         {Codex.Folders.UserInput, :inserted,
          label: :user_input_inserted,
          return_columns: [:folder_id, :key],
          subscribe_columns: [:folder_id]},
         {Codex.Folders.UserInput, :updated,
          label: :user_input_updated,
          trigger_columns: [:key, :value],
          return_columns: [:folder_id, :key],
          subscribe_columns: [:folder_id]},
         {Codex.Folders.UserInput, :deleted,
          label: :user_input_deleted,
          return_columns: [:folder_id, :key],
          subscribe_columns: [:folder_id]}
       ]},

Then when I subscribe:

      EctoWatch.subscribe(:user_input_inserted, :inserted, %{folder_id: folder_id})
      EctoWatch.subscribe(:user_input_updated, :updated, %{folder_id: folder_id})
      EctoWatch.subscribe(:user_input_deleted, :deleted, %{folder_id: folder_id})

What I tried to do is removing all restrictions on primary keys etc.

You can control:

trigger_columns Which columns will trigger the notification (could possibly configure an SQL condition to further refine)
return_columns What columns to include in the payload
subscribe_columns What columns can be used to partition/broadcast (could probably make this more flexible)

I think separating into those 3 allows full control on the 3 different stages of watching for db changes.

cheerfulstoic commented 1 month ago

I'm writing up something currently which has all of the different things that I've been thinking about (because I feel like I'm trying to balance a number of different design criteria). But I'm thinking about a subscribable_columns option, so I'm glad we're on the same page there. Hopefully I can have this up for discussion shortly.

Curious about return_columns, is that not the same as extra_columns, or are you wanting to have the option of not including the primary key in the payload?

Also, trigger_columns exists and it controls the REFERENCING definition of CREATE TRIGGER, so it already is in the SQL, though maybe you mean controlling the WHEN bit?

cheerfulstoic commented 1 month ago

Ok, I've started a couple of discussions. I don't know if they're complete, but all of the details have just been circling in my head and I need to get them out and have a discussion 😅

Here they are:

https://github.com/cheerfulstoic/ecto_watch/discussions/7

https://github.com/cheerfulstoic/ecto_watch/discussions/8

cheerfulstoic commented 1 month ago

I just released the three-element tuple change today as version 0.6.0 of ecto_watch. It's maybe not perfect given some of the discussion, but I think if there's still interest in further change could you create a new issue since this one has sprung into a couple of branches

cheerfulstoic / ecto_watch

Remove hardcoded id column in favor of reading from schema #3