Unified Primitive Container Types

blakejohnson commented 1 year ago

Comments due on this by EOD today.

blakejohnson commented 1 year ago

I guess maybe what my disconnect is that this is treating both the sampler and estimator as something you would want to potentially swap between, but in practice I don't think there is a use case for that. They're too unique in their usage to facilitate that in a generic way.

Ian's example of qiskit-experiments is already a case where we'd like to accept user workloads that consist of either sampler or estimator calls. With this proposal, we at least have some very basic assumptions of how to execute those workloads. The basic promises are:

primitives have a run method
primitives operate on Tasks.
primitives return Sequence[TaskResult].

The proposal here seems to provide nice functionality to facilitate dispatch on the specific types of TaskResult in order to write generic code.

ihincks commented 1 year ago

To me it feels like extra class heirarchy and inheritance

Note that this RFC doesn't actually call for any new classes or inheritance, it just calls for 3 new methods on the BasePrimitive, which already exists and is inherited from. I've heard the argument before, I forget from whom, that BasePrimitive doesn't actually exist to be a common structural abstraction, but rather to avoid code duplication of common features. I'm not sure I totally understand the distinction being made in that perspective, but in any case, I imagine it could be argued that result_type(cls, task: In) is doing the same thing.

ihincks commented 1 year ago

my disconnect is that this is treating both the sampler and estimator as something you would want to potentially swap between, but in practice I don't think there is a use case for that

In case it helps, any particular algorithm or experiment certainly can't swap between the two of them without, if even possible at all, a re-implementation. However, application frameworks built on top of the primitives may wish to reason about the primitives more abstractly without explicitly enumerating them, or talking about their inputs as union types of various signatures.

mtreinish commented 1 year ago

To me it feels like extra class heirarchy and inheritance

Note that this RFC doesn't actually call for any new classes or inheritance, it just calls for 3 new methods on the BasePrimitive, which already exists and is inherited from. I've heard the argument before, I forget from whom, that BasePrimitive doesn't actually exist to be a common structural abstraction, but rather to avoid code duplication of common features. I'm not sure I totally understand the distinction being made in that perspective, but in any case, I imagine it could be argued that result_type(cls, task: In) is doing the same thing.

It does implicitly mean new classes, because of #51 going to a new v2 version and also the proposed BasePrimitive here also would need to be done in a new v2 base class. The changes proposed here are incompatible with the existing BasePrimitive class and would cause issues with the current primitives interface. That being said from my perspective the existing BasePrimitive class doesn't actually provide any value, for the same reasons I outlined above. The methods that exist on it could easily have been handled as standalone validation functions, and IMO that would have been a cleaner interface.

jakelishman commented 1 year ago

TaskResult doesn't actually contain anything concrete in its interface after unwrapping the first abstract level - "DataBundle, metadata and any helper methods" have no concrete items or usable behaviour defined - so at the level of BasePrimitive, that's an opaque object. Similarly, Task is nearly completely opaque (and requiring that there's a circuits field typed QuantumCircuit feels like it could cause problems for interface expansions, like accepting OQ3 / a handle to a manually saved compiled circuit). Both give the impression of applying structure, but on those fronts, this RFC is only splitting one opaque object into three for TaskResult; the semantics of the inner objects remain opaque for abstract use. The result_type and _output_fields methods give a way to programmatically find the names of defined fields, but cannot specify what those fields actually mean and how they can be used, so any user of them must already be dispatching to something that is subclass aware. It's not clear to me that they have an abstract use, so I'd be concerned about including them in an interface.

The abstract part that Experiments could actually use from this interface (if I understand correctly) is that they want to do something like:

def run_experiment(primitive: BasePrimitive[TaskT, ResultT], tasks: Iterable[TaskT]) -> list[ResultT]:
    return primitive.run(tasks)

If the goal of this RFC is purely to ensure that the innermost "call" operation just has the same name between each primitive, that's ok by me, though personally I'd err on the side of being less constraining. It's not clear to me that this run method will certainly have exactly the same semantics for any new primitives that might be defined; could there be a future primitive that wants "reduction" semantics on its inputs rather than "broadcast", that might mean that only TaskResult and not Sequence[TaskResult] is appropriate as a return?

Given that Experiments will already be needing to dispatch on "is input Sampler or Estimator?", is it still worth having this abstraction to enable:

if is_sampler(primitive):
    tasks = generate_sampler_tasks(inputs)
else:
    tasks = generate_estimator_tasks(inputs)
# Typing is the abstract `Primitive.run`
results = primitive.run(tasks)
if is_sampler(primitive):
    # Required for strong typing, because the above call was abstract.
    results = typing.cast(results, SamplerResults)
    process_sampler_results(results)
else:
    results = typing.cast(results, EstimatorResults)
    process_estimator_results(results)

when the alternative would be:

if is_sampler(primitive):
    tasks = generate_sampler_tasks(inputs)
    # Now constrained to `Sampler.run`.
    results = primitive.run(tasks)
    process_sampler_results(results)
else:
    tasks = generate_estimator_tasks(inputs)
    # Constrained to `Estimator.run`
    results = primitive.run(tasks)
    process_estimator_results(results)

My feeling is that in this example, the former is not (sufficiently) better than the latter to motivate inclusion in a base interface.

mtreinish commented 1 year ago

my disconnect is that this is treating both the sampler and estimator as something you would want to potentially swap between, but in practice I don't think there is a use case for that

In case it helps, any particular algorithm or experiment certainly can't swap between the two of them without, if even possible at all, a re-implementation. However, application frameworks built on top of the primitives may wish to reason about the primitives more abstractly without explicitly enumerating them, or talking about their inputs as union types of various signatures.

This is where I'm struggling with the value here TBH, because I'm not sure there is a use case where you would not want to use a union type. Lets use a concrete example. If you look at what exists in qiskit-algorithms in the abstract (not for specifics) there are two VQE implementations SamplingVQE and VQE which take either a sampler or estimator respectively. You could argue that one could wrap both of those in an OmniVQE that would take either a sampler or estimator and then internally dispatch to an implementation based on the input primitive (although there are enough caveats between the two that it's not necessarily practical). But it would never be safe to use BasePrimitive as an input type, only the union of types as the input type because it only works with BaseSampler or BaseEstimator and they're the only valid choices for an input. If someone were to locally create a FourierTransform primitive class that subclassed BasePrimitive and defined run using Task and TaskResult it would not be a valid input to OmniVQE. I believe this pattern would be true for anything that would potential want either sampler or estimator.

I'm struggling to think of a case where someone would create something that could handle any BasePrimitive subclass where the only definition is that it has a run() method which take Task as an input and a sequence of TaskResult as an output with the interface being so generic. I feel like it's a safer and cleaner interface to just declare a common interface around Task and TaskResult on each concrete primitive interface definition and document that as a requirement for any potential future new primitives instead of trying to enforce it via subclassing.

blakejohnson commented 1 year ago

Based upon the feedback here, this RFC will be updated to focus on new primitive-centric container types instead of introducing a run(...) method to BasePrimitive.

ihincks commented 1 year ago

Thanks @mtreinish and @jakelishman for your thoughtful comments. @jakelishman, I think you are exactly right to focus specifically on the mechanics of dispatching. While it might intuitively feel like, unifying to a common BasePrimitive.run would help with this problem, when you actually sit down and try it, it's not so obvious. Given that it also constrains what we can do in the future, it would be best not to make this unification until there's a concrete need for it.

blakejohnson commented 1 year ago

Further discussion may continue in issues, PRs, and Slack.

Qiskit / RFCs

Unified Primitive Container Types #53