Open fkiraly opened 1 year ago
Thanks @fkiraly for the kind words and enthusiasm! The compliments are best directed at @olivercliff who did the software dev for this project.
I personally don't have the time or python expertise to contribute much to software expansion efforts, but @olivercliff may be able to weigh in on this point. It's possible @anniegbryant may be able to help somewhat but will leave to her…
Ultimately would be great to have a student or keen software dev join the team—e.g., could be a good Google Summer of Code project. Will keep you posted…
Hi @fkiraly, glad to hear you like it! In fact, I designed the code with future integration of the sktime/sklearn framework in mind, which is probably why certain parts of it feel familiar (and hopefully the integration would not be too much of a hassle).
Your two main points, imo, would not only allow integration with sklearn/sktime, but also significantly improve the readability and usability of the standalone package. My thoughts after having a quick look at the code you referenced:
sklearn-base
classes might be the more difficult aspect to implement, as it looks like it requires pyspi
to handle data differently - is that correct? Many methods store certain results directly in the data object in order to extract statistics from these results later on; otherwise the computation time blows out significantly. I imagine there is a simpler way to achieve this using the sklearn
framework but I have not come across it yet.BasePairwiseTransformerPanel
sounds achievable in a shorter period of time. Moreover, the arguments cover all cases that the methods in pyspi
require (e.g., multivariate or bivariate) and extend in useful directions (e.g., handles NaN or not).I am unfortunately quite short on time these days and don't work directly on the codebase anymore, so I think the idea of a GSoC project, as @benfulcher suggests, is a great way forward.
Hey @fkiraly, @benfulcher, @olivercliff!
Has there been any progress on the Google Summer of code? I might be interested in doing the sklearn integration, but I didn't find the project in the sktime projects list.
@bruAristimunha, apologies, I did not see this post!
Yes, we have been selected for GSoC 2024, and this would have been an excellent topic!
Unfortunately, the application deadline was April 2.
We could still work on this though? We have a great (unpaid) mentoring programme! https://github.com/sktime/mentoring/tree/main
Or perhaps @benfulcher has an academic internship available?
@benfulcher, @olivercliff, apologies, I missed the more recent discusion in my inbox.
Let us know if further collaboration here is of interest, we are going to kick off our summer workstreams in May.
Hi @fkiraly,
Unfortunately, doing unpaid work this way is not very interesting for me, but I appreciate the answer. It would be a "hard" project, with a lot of code, and a lot of time commitment.
Maybe next year if sktimes is selected.
@bruAristimunha, we did get selected 2024, getting paid would have required an application by April 2. Sorry that I did not see this.
How about an alternative idea then, @benfulcher: you (or someone from your team) could present pyspi
in one of the sktime
meet-ups, these are Fridays 4pm UTC at the moment. There is one free slot on April 26, and most of June is also available.
The aim would be to present pyspi
and a potential integration project, I'm sure many members of the community and adjacent listeners would find this interesting, someone might take that up.
Ok sounds good thanks for the invite—would be happy to present pyspi. @jmoo2880 has done a bunch of work on it recently, getting it into a nice format (e.g., now pip installable). Trouble is that 4pm UTC seems to be 2am Sydney time, so it's not going to work at that timing.
@anniegbryant, @benfulcher, I would like to congratulate you to this nice package, I really like the concept and it is quite nicely designed! There are also a lot of useful methods collected! Nice.
Now imo the next "big" question is integrability with the wider modelling ecosystem, e.g., can I use the pairwise time series metrics as components in
sktime
orsklearn
. Where with "I", of course, I mean the wider user ecosystem.Currently, I think there are a few blockers, but would you be interested to resolve them together?
Two main points imo from the codebase review:
sklearn
interoperable interfaces expect a few things such as__init__
signature related, and availability ofget_params
,set_params
. You can get this for free by inheriting fromscikit-base
base classes, of course that's not the only way to satisfy the interface requirements.sktime
has related classes which you could adopt or adapt, e.g., theBasePairwiseTransformerPanel
. Options could involve, writing an adapter insktime
, or using the class inpyspi
, the latter would give you testing for free by usingcheck_estimator
. Or, writing your own base class template based onscikit-base
that marries the current interface definition withsklearn
andsktime
expectations.Side points but synergistic points:
pycatch22
).sktime
orskbase
here.