Closed theogf closed 4 years ago
Hi @theogf,
In reply to the paper review: https://github.com/JuliaCon/proceedings-review/issues/51
Thanks for your nice comments and suggestions.
[ ] Regarding fit!
and transform!
, the type hierarchy looks like this:
MachineLearner <: TSLearner <: Transformer
Filter <: Transformer
Pipeline <: Transformer
fit!
in machine learner requires training of parameters for regression or classification. fit!
in machine learner requires both input and output. On the other hand, fit! in a filter process only the input because it doesn't learn any mapping between input and output. fit! in a filter is typically a straightforward computation of some statistics such as normalization stats, range, or PCA/ICA parameters for space embedding, etc.
The transform!
in both machine learner and filter apply these computed or learned parameters to the new dataset. In many types of filter,fit!
just merely checks for errors or initialize some variables but all important operations are done during the transform!
. In the machine learner, most critical operations are done during fit!
to learn the parameters of input->output mapping and transform!
is just application of parameters to new data.
[ ] Regarding individual results
No Free Lunch Theorem
. The examples demonstrate that you can trivially parallelize model selection and parameter optimization using the TSML pipeline due to the parallel support in Julia. The same parallelism may require much longer and complex coding if implemented in other languages.[ ] Regarding filter names
[ ] Regarding Dict
for collection of parameters in type/struct
[ ] Regarding inline documentation
[ ] Regarding option to drop missing value
@theogf, @christianpeel, @matbesancon: The just released TSML 2.3.9 now has inline documentation for the types and most important functions. It includes inline examples too.
If it's okay I am going to gather all my comments for the review in https://github.com/JuliaCon/proceedings-review/issues/51 here:
First it's a very nice package which is for sure extremely useful. I really like the pipeline composition style and how easy and flexible it is to construct it. The output part is also well made and give a clear understanding of the results. But I still have a few comments about the paper :
fit!
andtransform!
does not make always sense to me, I think it would be good to clarify better what they really mean, and eventually list the differences for the different types of "modules", i.e. if it is a reader, a filter or a classifier etc... It would also help for people wanting to create new filters. You also never explain what should be the output/input of each module in the pipeline.More on the API side/documentation :
[x] I also share @christianpeel 's opinion on the naming of the filters, for an external person it is not so clear what they do from first sight, but as in #89 this is a minor comment.
[x] Connected to the last point, is there a reason you always pass
Dict
to your modules? When using autocomplete tools, in Juno for instance, it is always more practical to see what arguments are possible, especially when one is not familiar with all the packages.[x] Similarly it would be nice to have inline documentation: for example calling
? DataValizer
would give a direct explanation of the filter and the possible arguments.PS : Is there any reason why you don't give the option to simply drop the missing values?