AlexMili / torch-dataframe

Utility class to manipulate dataset from CSV file
MIT License
67 stars 8 forks source link

Using torch-dataframe for time-series #29

Open suporteavancado opened 7 years ago

suporteavancado commented 7 years ago

First of all I would like to congratulate you for this great project.

I would like to know if it is possible to use the torch-dataframe for time series study.

Something similar to xts in the R.

Example:

TimeSeries1:

| Date | Values1 |

| 2016-12-27 21:00:00 | 10.00 | | 2016-12-27 21:01:00 | 10.01 | | 2016-12-27 21:02:00 | 10.02 | | 2016-12-27 21:04:00 | 10.04 | | 2016-12-27 21:07:00 | 10.07 |

TimeSeries2:

| Date | Values2 |

| 2016-12-27 21:00:00 | 20.00 | | 2016-12-27 21:01:00 | 20.01 | | 2016-12-27 21:03:00 | 20.03 | | 2016-12-27 21:05:00 | 20.05 | | 2016-12-27 21:06:00 | 20.06 | | 2016-12-27 21:07:00 | 20.07 |

Merge result of TimeSeries1 with TimeSeries2:

| Date | Values1 | Values2 |

| 2016-12-27 21:00:00 | 10.00 | 20.00 | | 2016-12-27 21:01:00 | 10.01 | 20.01 | | 2016-12-27 21:02:00 | 10.02 | NA | | 2016-12-27 21:03:00 | NA | 20.03 | | 2016-12-27 21:04:00 | 10.04 | NA | | 2016-12-27 21:05:00 | NA | 20.05 | | 2016-12-27 21:06:00 | NA | 20.06 | | 2016-12-27 21:07:00 | 10.07 | 20.07 |

Applying na.locf to the merged TimeSeries

| Date | Values1 | Values2 |

| 2016-12-27 21:00:00 | 10.00 | 20.00 | | 2016-12-27 21:01:00 | 10.01 | 20.01 | | 2016-12-27 21:02:00 | 10.02 | 20.01 | | 2016-12-27 21:03:00 | 10.02 | 20.03 | | 2016-12-27 21:04:00 | 10.04 | 20.03 | | 2016-12-27 21:05:00 | 10.04 | 20.05 | | 2016-12-27 21:06:00 | 10.04 | 20.06 | | 2016-12-27 21:07:00 | 10.07 | 20.07 |

Applying na.omit to the merged TimeSeries

| Date | Values1 | Values2 |

| 2016-12-27 21:00:00 | 10.00 | 20.00 | | 2016-12-27 21:01:00 | 10.01 | 20.01 | | 2016-12-27 21:07:00 | 10.07 | 20.07 |

Very Thanks

Danilo

gforge commented 7 years ago

This isn't available but should be easy to implement. All the positions of NA:s can be easily identified as they're stored as a tds.Hash with {id: true} structure, i.e. all you need to find is the id number and find the element immediately before and use that. Take a look at the Dataseries, you're welcome to add the functionality if you want to. Remember to write specs together with the functionality.

suporteavancado commented 7 years ago

Good afternoon,

I also believe it will be easy to implement the locf function. I will try. But as I am still learning about the project, if possible, I would be grateful if you could show me how the two timeseries would merge together according to the index column (dateTime). Just a little example if its possible.

Very thanks

Danilo

gforge commented 7 years ago

Great, start with writing a spec for the merge with the two dataframes and the desired outcome. I can then try to help you with the details of putting it together.

suporteavancado commented 7 years ago

require 'Dataframe'

df1 = Dataframe()

date1 = { "2016-12-27 21:00:00", "2016-12-27 21:01:00", "2016-12-27 21:02:00", "2016-12-27 21:04:00", "2016-12-27 21:07:00" }

value1 = { 10.00, 10.01, 10.02, 10.04, 10.07 }

df1:load_table{data=Df_Dict{date=date1, priceA=value1}}

df2 = Dataframe()

date2 = { "2016-12-27 21:00:00", "2016-12-27 21:01:00", "2016-12-27 21:03:00", "2016-12-27 21:05:00", "2016-12-27 21:06:00", "2016-12-27 21:07:00" }

value2 = { 20.00, 20.01, 20.03, 20.05, 20.06, 20.07 }

df2:load_table{data=Df_Dict{date=date2, priceB=value2}}

suporteavancado commented 7 years ago

example

gforge commented 7 years ago

Ok, so you want to do a full join. There is one main problem and that is that no-matter how clever our implementation feels it will most likely be inefficient compared to other SQL-solutions that have been at it for years. My general design thought regarding torch-dataframe is to allow simple manipulations and some other stuff that's good to have for building and training models. Implementing hard-core joins has therefore not been something that I've aimed at. I personally prepare my datasets in R and then export them to CSV before importing to Torch. R has the dplyr-package that is excellent for all kinds of merges etc.

Anyway if you still want to embark on implementing the merge then:

That's it. A few hours of work though :-P