Safe-DS / Library

A user-friendly library for Data Science in Python.
https://library.safeds.com
MIT License
17 stars 4 forks source link

TimeSeries Class in Safe-DS std-Lib #481

Closed Gerhardsa0 closed 8 months ago

Gerhardsa0 commented 12 months ago

Is your feature request related to a problem?

Right now we can not handle timeseries as timeseries in safe-ds. Also the taggedtable cannot hold a feature column as a target column. In timeseries it is common that the target column is also a feature column.

Desired solution

Add a timeseries class to the safe-ds std-lib.

Possible alternatives (optional)

No response

Screenshots (optional)

No response

Additional Context (optional)

No response

Gerhardsa0 commented 11 months ago

After working on this Issue for a while, I get pretty unsure if an extra time series class is really needed or a good style for the API. A time series share a lot of characteristics with tagged table. They differ in that a time series has a date_column, a windowsize, aforecasthorizon and allows an target_column also to be an feature column. But forecasthorizon are not really part of a time series and more part of "window settings".

In my newest commit in Branch, I wrote a Window generator, which would also work on tagged table or table. This is a huge advantage, as I don't store the windows in the time series itself. I store references of the pandas' series in lists, so I can later load them into tensors faster. These are the functions are: _create_all_windows_for_column and _create_all_labels_for_target_column in the TimeSeries class I do this by using generator, which is faster and simpler in code. (https://stackoverflow.com/questions/3487802/which-is-generally-faster-a-yield-or-an-append#comment3745740_3487844) Another big advantage is, that I don't need an extra column type because the references are hold in separate lists and not in the table.

My Goal with that is, to use the generators as translator to tensors and only generate them, when they are needed as input. We don't need to store them in a table. So my main thought, why we should have a time series is kind of gone, because windows don't need to be stored.

A different take would be to create something like a translator tool, which takes a tagged table and some settings to define the input and creates then the Input for the neural networks.

lars-reimann commented 8 months ago

:tada: This issue has been resolved in version 0.18.0 :tada:

The release is available on:

Your semantic-release bot :package::rocket: