Reconsider "level" of the Task/TaskMut interface

djmitche commented 1 year ago

The problem that brought me [to GothenburgBitFactory/taskwarrior#3029] is, Taskwarrior mostly doesn't call functions like set_description or add_annotation -- it interfaces directly with the key/value pairs that make up a task, via set_value. But it does call set_status since that updates the working set. So the issue was that Taskwarrior would call

t.set_value("modified", "12345")
t.set_status(Status::Pending)

and the second call would not "see" that the modified timestamp had already been set, and would do so again. On import, this causes a problem by setting the "modified" timestamp of all tasks to the time the import occurred.

Overall, this is a major "impedance mismatch" between TC and TW: TW wants to handle tasks as key/value maps, while TC was really designed for API consumers to call methods that embody more knowledge about the meaning of particular keys. These are more than just convenience wrappers, though: they also have this set-the-modified-timestamp thing, as well as automatically updating the working set. And (referring to GothenburgBitFactory/taskwarrior#3028) it might be good to cache parent/child relationships as well, to avoid having to load most of the pending tasks one-by-one. I've been mulling this over for a while but haven't come up with a concrete way to do better.

Originally posted by @djmitche in https://github.com/GothenburgBitFactory/taskwarrior/issues/3029#issuecomment-1414569579

djmitche commented 1 year ago

cc @ryneeverett continuing discussion from GothenburgBitFactory/taskwarrior#3029.

ryneeverett commented 1 year ago

A better understanding of why we're dissatisfied with the current interface would be helpful for developing and evaluating solutions. When a refactor doesn't solve a concrete problem or enable a concrete use case, it can be difficult to tell whether it is an actual improvement. In the case of GothenburgBitFactory/taskwarrior#3029, I would suggest that we were uncomfortable with that solution because it was not encapsulated in TaskMut but rather dependent upon the consumer manually setting the "modified' attribute before calling a method which automatically calls TaskMut::update_modified().

(In fact, I can't tell if GothenburgBitFactory/taskwarrior#3029 reliably fixes that. Looking at TDB2::add, it appears that the "modified" attribute has to be before "status" in the task.all() iteration or it will get automatically set by TaskMut::set_status() -> TaskMut::set_string() -> TaskMut::update_modified(). Is there any guarantee of that attribute iteration order? I'm not able to follow the Task construction from CmdImport::importSingleTask all the way to taskchampion. I think it goes through TCTask but I don't see exactly where/how a json object of task attributes constructs a Task task (obj);, so I don't know what kind of object we're dealing with here.)

Does the "impedence mismatch" manifest itself anywhere other than task imports? Can we envision any use case for a lower-level interface other than the import command?

Of the four "dynamic defaults" listed in CmdImport (uuid, modified, entry, and end), it seems that "modified" is the only one that is dynamically set within the Task/TaskMut implementations and the rest are implemented in Replica::new_task. This suggests that Task/TaskMut are the natural low-level interface and we just need to extract the high-level features (automatic TaskMut::update_modified() and conditional TaskMut::replica.add_to_working_set(self.uuid)). Some different ideas in no particular order:

CmdImport::importSingleTask could propagate an argument to construct tasks that already have updated_modified=true. (E.g.: Context::getContext ().tdb2.add (task, default_attrs=false);)
TaskMut could have a field storing a HashMap of modified attributes and wait to "commit" them all to the replica at once. Then we wouldn't need to worry about changing the "modified" attribute multiple times and it would only get automatically set if it isn't already in the HashMap when committing. On commit it could conditionally update the working set. It seems like this is essentially the way TaskMut is supposed to be used anyway since it can't mutate the "modified" attribute more than once. But I'm not sure how easy it would be to implement automatic commit or how much friction requiring a manual TaskMut::commit() would be.
A high-level TaskMut::import(&mut self, attrmap: HashMap) could set updated_modified=true, conditionally update the working set, and then iterate over the attrmap, calling self.set_value().
A high-level TaskMut::set(&mut self, attrmap: HashMap) could be the exclusive public interface. If "modified" is not in attrmap it gets a dynamic default. Then it conditionally updates the working set, and iterates over the attrmap, calling self.set_value(). (In fact, couldn't we have a Task::set create the TaskMut and do the whole operation in that method? That question probably reveals that I don't really understand the rationale behind the Task/TaskMut distinction.)

djmitche commented 1 year ago

Does the "impedence mismatch" manifest itself anywhere other than task imports? Can we envision any use case for a lower-level interface other than the import command?

It does, and maybe import is not the best angle to view it from. Summarizing the "mismatch" as concisely as I can:

TW treats tasks as key/value maps, and coordinates updates to the keys for a task to accomplish higher-level things like task N start or task N mod +foo -bar wait:2d.
TC stores tasks as key/value maps internally, but presents them as things with meaningful attributes like "entry" or "status" or "description". It has methods for each of those, with typed inputs. And, it does some of the coordination based on those attributes, such as updating "modified" when a task is modified, or moving a task into the working set when its status becomes "pending", or updating the dependency graph when a dependency is changed.

The idea was that users of TC as a library would not need to remember the particulars of how "entry" is formatted or that "modified" has to be updated -- they could just call simple functions like t.start() or t.add_annotation(..) and the right key/values would be updated. But, that's very much not how TW is built.

It seems like we should have a "high level" API (with t.start() etc.) and a "low level" API (that TW can use, and that doesn't do any automatic coordination).

I like the idea of batching updates, too, especially in the low-level API. That might allow us to get rid of the Task/TaskMut distinction, somewhat like you've suggested, by only requiring an exclusive reference to a Replica for Task::commit. There are a few spots where Taskwarrior modifies tasks in memory and then gives an "are you sure" prompt to the user; a separate Task::commit operation might support that better, too (my memory is a little vague on this point, sorry).

The rationale for Task/TaskMut is this: we want to be able to hold onto several Tasks at the same time for purposes of reading data from the. However, you need an exlusive reference (&mut) to a Replica in order to make changes to tasks -- that's literally mutating the replica, so it makes sense. So if there were no TaskMut and every Task carried a mutable reference to a Replica, then you could only have one Task at a time, and couldn't do anything else with the Replica while it existed.

djmitche commented 3 months ago

I've gotten a start to this now, in #413. I'll leave some notes here that I will probably return to and update later.

Notes

General Goal

Tasks can be read and queried, with tasks represented as a simple wrapper around a HashMap.
Changes to tasks are represented as Operation instances, collected in an Operations type that is a thin wrapper around Vec<Operation>. New operations can be created in a variety of ways such as Replica::new_task, Task::set_property, or Task::delete, all of which take &mut Operations. (Note that this is, in essence, #373)
Operations are "committed" to the task DB by passing the Operations to a Replica method.
All of this blends nicely with the current undo support: the latest batch of undo operations are represented as Operations, and committing that undo involves reversing the operations and committing the result, with an optimization of "popping" matching operations from the DB's operations (as happens now).

Complications

taskdb::apply, which is basically the "commit" operation above, currently takes a SyncOp and creates an Operation from it. An Operation contains the information in a SyncOp but also undo-related information. Maybe it is more precise to capture this undo-related information at the time that the change is committed, but that would mean throwing out any undo-related information provided during the accumulation of the Operations.
Replica will need to keep its working set up to date based on the operations it applies.
Ideally the existing higher-level Task API would also stick around -- or something like it -- for users that want to operate at that higher level. For example, an Android app for adding new tasks on voice command probably doesn't want to be bothered with how the modified property is encoded. It just wants to call something like replica.add_task() and then task.set_description(), etc. and have everything work out.

Use by Taskwarrior

For read-only purposes, TW's Task mirrors the first bullet point above: they are little more than std::map <std::string, std::string> data. So, they could easily be adjusted to contain a tc::Task representing the same thing.

The existing TDB2::modify method takes a Task, compares it to the one in the DB, and makes the necessary changes such that the DB matches the given Task. If Task carries an Operations recording the changes made to it, then TDB2::modify would just be committing those changes.

All of the other operations are more immediate: deleting, creating, etc. Those can be accomplished by creating the necessary operations and immediately committing them.

djmitche commented 3 months ago

A bit more progress now. I think we can do this without breaking changes to the public API, and I'd like to get that merged first (it will be split over a few PRs).

Once that's done, I think we can actually do a much better job with the "high-level" API, too. Since nothing really uses those, I think this is a good time to break them.

djmitche commented 3 months ago

The sequence of PRs I've got in https://github.com/GothenburgBitFactory/taskchampion/compare/main...djmitche:taskchampion:issue372-operations ends up creating a new BasicTask that just exposes getters and setters, with the setters generating a sequence of Operations that can later be committed. The existing functionality (Task, TaskMut and Replica functions like delete_task) are all rewritten in terms of this new type, without breaking API changes (I thope!).

I'm not sure what to do next. I kind of dislike the name BasicTask and would like to just have one Task type. But, I'd also like to have a clear distinction between low-level and high-level functionality in the generated docs. A few ideas I've had:

Keep BasicTask but modify the existing high-level TaskMut methods to generate operations, e.g., task.set_status(Status::Completed, &mut ops) and move them to the Task type. This would be a breaking API change, but keeps the high-level interface while allowing multiple operations to be batched into a single transaction. However, it still has two task types (BasicTask and Task)
Rename BasicTask to Task, and implement the high-level methods in a trait like HighLevelTask - basically an "extension trait". This would separate those methods nicely in the docs, but is otherwise kind of a hack.
Just merge BasicTask and Task and TaskMut and include some preliminary text in the docstring to describe the low- and high-level interfaces.

I'm open to opinions or other ideas!

ryneeverett commented 3 months ago

Another idea, though I doubt it is better:

Make Task a wrapper around BasicTask by replacing the uuid and taskmap properties with a single basictask property and implementing the higher level interface entirely by using the lower level interface. If you wanted to expose the low-level interface also you could define the property as pub basictask.

Edit: Reflecting on this a bit more, I think I can make a case that this option has many of the benefits of the other options without the drawbacks:

Like the first option, Task ends up defined in terms of operations, but vicariously rather than a parallel implementation.
Like the second option, we get DRYness and unity, but through composition rather than "kind of a hack".
Like the third option, the low-level interface is available on the same type as the high-level interface, but rather than depending on docstrings this would be communicated by accessing methods on the basictask property.

djmitche commented 2 months ago

That makes a lot of sense - thanks! I appreciate your careful thought about this.

djmitche commented 2 months ago

Current status: this is largely finished in taskchampion, and I'm working on building the FFI interface using https://cxx.rs in Taskwarrior. That's already led to a few relatively minor local changes in TC, and may generate a few more, Once I've got all of that working I'll put the changes up for review.

GothenburgBitFactory / taskchampion