benfulcher / hctsa

Highly comparative time-series analysis
https://time-series-features.gitbook.io/hctsa-manual/
Other
734 stars 315 forks source link

Ignoring operations belonging to certain toolboxes #23

Open SirSteel opened 6 years ago

SirSteel commented 6 years ago

Hello,

This might be considered as a feature request or seeking advice how to hack it.

The gist of the problem is, I do not have all the toolboxes available. Seemingly producing errors and calling the functions adds an unnecessary overhead to my computation time, which I would want to minimize.

What I would like to have the option to do, is to exclude operations that belong to certain toolboxes, as to avoid even calling functions I know will fail.

The TS_ops.txt does not seem to contain information on which toolbox something belongs to.

So, if anyone has any ideas, how I could perform, this to speed up my computations, I would be very happy.

And thank you for a nice software suite! :)

PS: I have many more questions, and I have a feeling this is more a bug reporting place so where would be a more appropriate place to ask those.

benfulcher commented 6 years ago

Hi @SirSteel! Thanks for your interest in the package :) I've just added a list of toolbox dependences that I generated a couple years ago, in this commit: c6e6aa42082f3fe9d7023b225fa856143f25bbaf

I don't think dependencies would have changed much since, but think of this as a starting point; do feel free to modify. I have in the past used keywords added to features based on their toolbox dependencies (e.g., as specified in INP_ops.txt) to filter them, but you could do a simple filtering of INP_ops.txt based on master operations (listed in INP_mops.txt) that match functions listed in ToolboxDependencies.txt -- i.e., generate a new INP_ops_toolboxFiltered.txt and INP_mops_toolboxFiltered.txt that specifies the filtered lists (then you can run e.g., TS_init taking these as the input files specifying the list of operations).

Note also that this filtering may be overly harsh (e.g., it may only be a single output of a function that relies on a particular toolbox; the others may compute fine). If you really wanted to get every number you could, you could alternatively run it on your dataset and then list any features that consistently produced missing data, and use that as the basis of filtering to a new reduced set for future calculations.

You kind of need the Statistics/Machine Learning toolbox, but you could get away with not having the others. Let me know how you get on.

Happy to help with any other questions -- please send any non-issue-like questions to my email :)

Best,

Ben

SirSteel commented 6 years ago

Hello,

Thank you very much for your very quick reply. Sorry that it took so long for me to reply, I was swamped with work.

I am using your software suite for a project during my PhD, and so far I am still getting familiarized with its operations. I am glad I can address any issues I might have to you over the email.

Thank you! :)

Best regards, Luka Zeleznik

On Tue, Nov 28, 2017 at 8:04 PM, Ben Fulcher notifications@github.com wrote:

Hi @SirSteel https://github.com/sirsteel! Thanks for your interest in the package :) I've just added a list of toolbox dependences that I generated a couple years ago, in this commit: c6e6aa4 https://github.com/benfulcher/hctsa/commit/c6e6aa42082f3fe9d7023b225fa856143f25bbaf

I don't think dependencies would have changed much since, but think of this as a starting point; do feel free to modify. I have in the past used keywords added to features based on their toolbox dependencies (e.g., as specified in INP_ops.txt) to filter them, but you could do a simple filtering of INP_ops.txt based on master operations (listed in INP_mops.txt) that match functions listed in ToolboxDependencies.txt -- i.e., generate a new INP_ops_toolboxFiltered.txt and INP_mops_toolboxFiltered.txt that specifies the filtered lists (then you can run e.g., TS_init taking these as the input files specifying the list of operations).

Note also that this filtering may be overly harsh (e.g., it may only be a single output of a function that relies on a particular toolbox; the others may compute fine). If you really wanted to get every number you could, you could alternatively run it on your dataset and then list any features that consistently produced missing data, and use that as the basis of filtering to a new reduced set for future calculations.

You kind of need the Statistics/Machine Learning toolbox, but you could get away with not having the others. Let me know how you get on.

Happy to help with any other questions -- please send any non-issue-like questions to my email :)

Best,

Ben

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/benfulcher/hctsa/issues/23#issuecomment-347629473, or mute the thread https://github.com/notifications/unsubscribe-auth/ACxrdD2aCnq7gEI5qZW0yVGmdVS7W58hks5s7FlCgaJpZM4QtOYb .