@aa25desh I've taken a look at the overall structure of the code and have some comments. I can see a lot of work has gone into this, particularly into core algorithms (which I have not, however, reviewed in any detail). Be great if you can look at this when you have some time.
cc @vollmersj @mloning
[x] Please change the name of this repository to TimeSeriesClassification.jl,
or something similar without MLJ prefix, which we are now
reserving for packages providing core functionality.
[ ] Where does the right hand side for this test come from? If this is
the output of some alternative implementation (e.g., sk-learn), then
please state this clearly in a comment. Otherwise, explain why you
know the right hand side must be the correct output (Independent of
your implementation). In any case, the test is not robust because you are
comparing floats using ==. Instead, please use approx or ≈.
[ ] I think a lack of unit tests here is still a serious issue. What
about a unit test for these functions?
[ ] You should not have MLJBase as a dependency unless you
absolutely need it. I see that you use accuracy. I'm guessing you only need it for a
test? The idea is that the light-weight package MLJModelInterface
should suffice. You will still need MLJBase as a dependency for
testing, i.e., listed under [extras] and [targets]. Read
this
carefully. If you're not sure how to include dependencies for
testing, look at the examples at MLJModels or elsewhere.
[ ] Perhaps review the inclusion of MultivariateStats and Distributions
as dependencies, as these are pretty hefty. Note that you can use
the UnivariateFinite constructor without Distributions or
MLJBase.
[ ] Before registering your package, you will need to give every package
in [deps] that is not part of the standard library and explicit
[compa] entry. Then you can accelerate merging of patch and
minor releases.
[ ] At a minimum, documentation needs to include a description of each
model provided (Could be a table), ideally including an explanation
of all hyper parameters. This could be as simple as a reproduction
of the doc strings. I would put this directly in the readme. It is
difficult to find this information quickly from the other
"documentation" that you provide. If input data is matrix rather
than tabular, say whether your observations correspond to
rows or columns. Probably worth stating
explicitly what input is allowed, since a lot of MLJ models use
tabular data.
[ ] You shouldn't need to export the methods predict, predict_mean,
fitted_params, predict_mode, as you are overloading methods
already defined in MLJModelInterface which are already exported by
MLJ or MLJBase. The methods accuracy, fit!, and machine are
also already exported by MLJ/MLJBase. At the moment, the user's
work-flow would begin using MLJ; using MLJTime; .... After your package is
registered, the work-flow will be using MLJ; @load TimeSeriesForestClassifier ... or similar.
[ ] predict_new looks like a private method; it is not part of the MLJ
API; I don't think you need to export it.
[ ] ditto RandomForestClassifierFit.
[ ] I suggest exporting your model types here (for example,
TimeSeriesForestClassifier). (Does the example on the read me
actually work without this export?)
[ ] We need model metadata here. Here's a suggestion for the first
classifier:
I'm assuming here that your inputs are matrices of abstract floats. If
you change your requirements for the input type, then you'll need to
modify the input_trait accordingly.
@aa25desh I've taken a look at the overall structure of the code and have some comments. I can see a lot of work has gone into this, particularly into core algorithms (which I have not, however, reviewed in any detail). Be great if you can look at this when you have some time.
cc @vollmersj @mloning
MLJ
prefix, which we are now reserving for packages providing core functionality.https://github.com/alan-turing-institute/MLJTime.jl/blob/b38e4b5dd1aba2d2b2b6402ec4568ee9b1c98970/test/runtests.jl#L10
[ ] Where does the right hand side for this test come from? If this is the output of some alternative implementation (e.g., sk-learn), then please state this clearly in a comment. Otherwise, explain why you know the right hand side must be the correct output (Independent of your implementation). In any case, the test is not robust because you are comparing floats using
==
. Instead, please useapprox
or≈
.[ ] I think a lack of unit tests here is still a serious issue. What about a unit test for these functions?
_discrete_fourier_transform
,transform
(at this line_shorten_bags
,select_sort
InvFeaturesGen
(version on this lineapply_kernel
apply_kernels
https://github.com/alan-turing-institute/MLJTime.jl/blob/b38e4b5dd1aba2d2b2b6402ec4568ee9b1c98970/Project.toml#L6
[ ] You should not have MLJBase as a dependency unless you absolutely need it. I see that you use
accuracy
. I'm guessing you only need it for a test? The idea is that the light-weight package MLJModelInterface should suffice. You will still need MLJBase as a dependency for testing, i.e., listed under [extras] and [targets]. Read this carefully. If you're not sure how to include dependencies for testing, look at the examples at MLJModels or elsewhere.[ ] Perhaps review the inclusion of MultivariateStats and Distributions as dependencies, as these are pretty hefty. Note that you can use the
UnivariateFinite
constructor withoutDistributions
orMLJBase
.[ ] Before registering your package, you will need to give every package in [deps] that is not part of the standard library and explicit [compa] entry. Then you can accelerate merging of patch and minor releases.
https://github.com/alan-turing-institute/MLJTime.jl
https://github.com/alan-turing-institute/MLJTime.jl/blob/b38e4b5dd1aba2d2b2b6402ec4568ee9b1c98970/src/MLJTime.jl#L12
[ ] You shouldn't need to export the methods
predict
,predict_mean
,fitted_params
,predict_mode
, as you are overloading methods already defined in MLJModelInterface which are already exported by MLJ or MLJBase. The methodsaccuracy
,fit!
, andmachine
are also already exported by MLJ/MLJBase. At the moment, the user's work-flow would beginusing MLJ; using MLJTime; ...
. After your package is registered, the work-flow will beusing MLJ; @load TimeSeriesForestClassifier ...
or similar.[ ]
predict_new
looks like a private method; it is not part of the MLJ API; I don't think you need to export it.[ ] ditto
RandomForestClassifierFit
.[ ] I suggest exporting your model types here (for example,
TimeSeriesForestClassifier
). (Does the example on the read me actually work without this export?)[ ] We need model metadata here. Here's a suggestion for the first classifier:
I'm assuming here that your inputs are matrices of abstract floats. If you change your requirements for the input type, then you'll need to modify the
input_trait
accordingly.https://github.com/alan-turing-institute/MLJTime.jl/blob/b38e4b5dd1aba2d2b2b6402ec4568ee9b1c98970/src/interface.jl#L93