felipeangelimvieira / prophetverse

A multiverse of Prophet models for timeseries
https://felipeangelimvieira.github.io/prophetverse/
Apache License 2.0
23 stars 2 forks source link

[ENH] `sktime` indexing in `all_estimators` and search function #64

Open fkiraly opened 2 weeks ago

fkiraly commented 2 weeks ago

Efforts to discuss how best to add indexing for prophetverse estimators in sktime.

Here is a draft PR as a basis for discussion: https://github.com/sktime/sktime/pull/6614

I am trying to think of the best pattern for the user and for directly indexed packages.

Currently the PR uses a delegation pattern, which leads to a "copy" of the estimator in sktime. This is unsatisfactory imo as the copy is basically just a database entry.

What we could do is replace the copy by a direct import from prophetverse if the package is installed, avoiding the duplication of classes. However, at current state that would not work because:

I also worry about the impact of having different classes depending on whether prophetverse is installed. Though I cannot see a problem because the constructor of the sktime class will failif prophetverse is not present.

Thoughts, @felipeangelimvieira?

felipeangelimvieira commented 2 weeks ago

Franz, so adding the missing tags and using ReStructureText in docstrings would make it possible to direct import it? If yes I can work on it this week. What do you think about it?

fkiraly commented 2 weeks ago

Franz, so adding the missing tags and using ReStructureText in docstrings would make it possible to direct import it?

Yes, but there needs to be a placeholder if the prophetverse is not present in the environment, since the retrieval utility is using the class and its tags as a database record.

So it could look like this:

if _check_soft_dependencies("prophetverse", severity="none"):
    from prophetverse.sktime import Prophet as ProphetverseUnivariate
else:
    class ProphetverseUnivariate(...

Does this feel like the right pattern?

felipeangelimvieira commented 2 weeks ago

I'm also thinking about the dependency management. Would Prophetverse be added as a soft dependency? 'm imagining the case where there's an upgrade in Prophetverse so that the signature of init is changed. The "way of working" would be:

  1. Release a new minor/major of Prophetverse
  2. Open an issue in sktime
  3. Create an PR that updates Prophetverse dependency and the placeholder content

Is that correct?

fkiraly commented 2 weeks ago

I'm also thinking about the dependency management. Would Prophetverse be added as a soft dependency?

So am I, it's difficult to find a simple and convenient pattern here - the crux is to think of something that scales well and is convenient for both user and developer.

dependency handling

Right now, the soft dependencies are imported in the CI mainly for testing.

We are already hitting limits of what GHA jobs allow due to number of estimators, if all are tested simultaneously, so without modification the pattern wlil not scale.

Since prophetverse has its own CI, I was thinking about not adding it as an sktime soft dependency, but adding it to the index only. Then, testing that docstring and tags are identical between the index placeholder and the actual class, but in prophetverse. This could be an extra "sync" CI job that certifies for the integration.

The alternative option is to continue adding soft dependencies to sktime and run the tests in sktime. As said, that does not scale well...

We are already thinking about testing estimators individually as the "ideal end state", see https://github.com/sktime/sktime/issues/5719, that's closer to them each being in another package individually, than to them being all in a monorepo.

change and deprecations workflow

What you say does work, as long as changes are downwards compatible.

What makes me manifestly unhappy is that you have to sync updates in two places. Surely there is a better way here, although I cannot think of a better one without one that assumes maintaining a separate register, like pypi.

Of course we constantly handle changes in 3rd party packages, but the workflow is obviously very sktime centric.

Perhaps for the short-term, that's the right point of comparison - becaues in 3rd party package, it is also the reverse dependency that leads with changes.

We may remove step 2 perhaps, that makes it simpler?

fkiraly commented 2 weeks ago

adding link to the general "patterns" discussion: https://github.com/sktime/sktime/issues/6639

fkiraly commented 1 week ago

@felipeangelimvieira, what is your preference in relation to the release in the next days? In a scenario where the adaptations in prophetverse need more time: We could release with the two-layered adapter now, and move to the import pattern later on. Or, we could delay for another month?

felipeangelimvieira commented 1 week ago

Hey Franz, I wasn't able to work on the package those last days... but I plan to dedicate some time this weekend. So, to get everything ready for the integration, we need to:

  1. Format docstrings as in sktime (ReStructuredText)
  2. Add authors, python_dependencies and maintainers tags
  3. Create a PR to sktime to add the docstring placeholders to sklearn

Is that right?

fkiraly commented 1 week ago

Yes. I have already done 1, 2, 3 though in sktime - you can just copy the docstring, and tags from here:

https://github.com/sktime/sktime/pull/6614

Also, I think you may not even need to reformat the docstring, since the placeholder in sktime is used for rendering.

fkiraly commented 1 week ago

Yes, I just checked:

You still may want to check that the docs are similar or the same. Of course in sktime the estimator description should say sth like "xyz from prophetverse" for credit.

fkiraly commented 1 week ago

Question: which pattern should we go with?

  1. using delegator. Always the same class, see https://github.com/sktime/sktime/pull/6614 current state

  2. using switch: if _check_soft_dependencies(prophetverse): from prophetverse import X as Y

felipeangelimvieira commented 1 week ago

I see that 1. do not allow the user to call the public methods of Prophetverse (predict_all_sites, predict_samples, etc). They of course can be called by using the inner estimator at the cost of user experience.

However, I don't believe this is a significant issue. Users of sktime may not be specifically interested in using those custom methods. Additionally, documenting them in sktime might confuse users into thinking these methods are part of the sktime API.

fkiraly commented 1 week ago

Hm, I see. I could try to write a delegator class that ensures all public methods are delegated. That is a bit unfortunate, as it would lead to desyncing of interfaces.

I have been trying around with different patterns, and 2. should work, however there is a major obstacle: the name. sktime assumes a unique namespace of estimators, and there would be two Prophet-s then.

Is there an easy way to mimick a class but replace the name? import X as Y will not work, because Y.__name__ will still be "X".

felipeangelimvieira commented 1 week ago

Those extra public methods are specially useful for more advanced usage, such as Marketing Mix Modelling, analyzing parameter distributions... These examples may be outside the scope of sktime, so maybe it is ok not to have them?

felipeangelimvieira commented 1 week ago

If the name is currently a problem for 2., it may not be a problem in a near future. Currently we have Prophet, ProphetGamma and ProphetNegBinomial in different classes. I was thinking about refactoring and at least unifying the classes with continuous likelihood distributions, by adding an extra hyperparameter "likelihood". This new class could be called Prophetverse

felipeangelimvieira commented 1 week ago

Maybe it would be good to wait for that refactoring, so that you don't have those breaking changes in sktime, and also the solution in 2. is possible?

fkiraly commented 1 week ago

These examples may be outside the scope of sktime, so maybe it is ok not to have them?

Sure it's ok, it's just a pity not to have them and to have the two classes being different. That way, the extra features will get less exposure.

Maybe it would be good to wait for that refactoring, so that you don't have those breaking changes in sktime, and also the solution in 2. is possible?

Sure. If you settle on a "final" name, we could do 1. (with the estimator in sktime getting the final name now) and later switch the delegator for a placeholder. That should break no user code, if parameter names, defaults, etc, stay the same.

Can also go with the "wait" option ofc.

felipeangelimvieira commented 1 week ago

Sure. If you settle on a "final" name, we could do 1. (with the estimator in sktime getting the final name now) and later switch the delegator for a placeholder. That should break no user code, if parameter names, defaults, etc, stay the same.

That works for me. We can go with Prophetverse as the class name. Does that work for you too?

felipeangelimvieira commented 1 week ago

@fkiraly Just merged the Prophetverse class to master. Next I'll be working on using RestructuredText for docstrings, it'll help keeping them in sync.

fkiraly commented 1 week ago

We can go with Prophetverse as the class name. Does that work for you too?

Works. You can pick anything that does not result in name space clashes and is not in violation of the code of conduct.

fkiraly commented 1 week ago

I've converged on a decorator design, would be keen to hear your opinion.

Unfortunately, there is a snag: the current _check_estimator_deps cannot be used as it imports the 2nd party package as a means of checking, leading to circular imports. Instead, find_spec needs to be used, which is being worked on in this PR https://github.com/sktime/sktime/pull/6355.

So, once #6355 is merged, we would have a simple pattern with a decorator that simply uses _check_estimator_deps based on the placeholder record!

The prophetverse specific PR is here: https://github.com/sktime/sktime/pull/6614 This does the following:

felipeangelimvieira commented 1 week ago

Seems like a really nice solution! Planning to deploy Prophetverse 0.3.0 this week after linting and adapting the docstrings to RestructuredText format.

fkiraly commented 2 days ago

With 0.3.0 released: have you checked on your side whether the sktime Prophetverse index placeholder results in the desired user experience, i.e., the class imported is identical with the prophetverse class?

I have checked and it should work, just checking in with you that it is what you also expect.

PS: I fixed some typos in the sktime placeholder docstring.