Open suporteavancado opened 7 years ago
This isn't available but should be easy to implement. All the positions of NA:s can be easily identified as they're stored as a tds.Hash with {id: true}
structure, i.e. all you need to find is the id
number and find the element immediately before and use that. Take a look at the Dataseries, you're welcome to add the functionality if you want to. Remember to write specs together with the functionality.
Good afternoon,
I also believe it will be easy to implement the locf function. I will try. But as I am still learning about the project, if possible, I would be grateful if you could show me how the two timeseries would merge together according to the index column (dateTime). Just a little example if its possible.
Very thanks
Danilo
Great, start with writing a spec for the merge with the two dataframes and the desired outcome. I can then try to help you with the details of putting it together.
require 'Dataframe'
df1 = Dataframe()
date1 = { "2016-12-27 21:00:00", "2016-12-27 21:01:00", "2016-12-27 21:02:00", "2016-12-27 21:04:00", "2016-12-27 21:07:00" }
value1 = { 10.00, 10.01, 10.02, 10.04, 10.07 }
df1:load_table{data=Df_Dict{date=date1, priceA=value1}}
df2 = Dataframe()
date2 = { "2016-12-27 21:00:00", "2016-12-27 21:01:00", "2016-12-27 21:03:00", "2016-12-27 21:05:00", "2016-12-27 21:06:00", "2016-12-27 21:07:00" }
value2 = { 20.00, 20.01, 20.03, 20.05, 20.06, 20.07 }
df2:load_table{data=Df_Dict{date=date2, priceB=value2}}
Ok, so you want to do a full join. There is one main problem and that is that no-matter how clever our implementation feels it will most likely be inefficient compared to other SQL-solutions that have been at it for years. My general design thought regarding torch-dataframe is to allow simple manipulations and some other stuff that's good to have for building and training models. Implementing hard-core joins has therefore not been something that I've aimed at. I personally prepare my datasets in R and then export them to CSV before importing to Torch. R has the dplyr-package that is excellent for all kinds of merges etc.
Anyway if you still want to embark on implementing the merge then:
to_timestamp
as a string will be terribly inefficient to work with. I know the to_categorical function is rather slow and this will be even worse - consider doing this in C if you have large datasets. There is a SO post that may be helpful.sort
for both tables using torch.sort
's second return value that retrieves the indexes of the sort. Note that you will need to mask the missing data when sorting and appending the missing elements at the end.full_join
function should
add_column
) that can contain the merged datasetThat's it. A few hours of work though :-P
First of all I would like to congratulate you for this great project.
I would like to know if it is possible to use the torch-dataframe for time series study.
Something similar to xts in the R.
Example:
TimeSeries1:
| Date | Values1 |
| 2016-12-27 21:00:00 | 10.00 | | 2016-12-27 21:01:00 | 10.01 | | 2016-12-27 21:02:00 | 10.02 | | 2016-12-27 21:04:00 | 10.04 | | 2016-12-27 21:07:00 | 10.07 |
TimeSeries2:
| Date | Values2 |
| 2016-12-27 21:00:00 | 20.00 | | 2016-12-27 21:01:00 | 20.01 | | 2016-12-27 21:03:00 | 20.03 | | 2016-12-27 21:05:00 | 20.05 | | 2016-12-27 21:06:00 | 20.06 | | 2016-12-27 21:07:00 | 20.07 |
Merge result of TimeSeries1 with TimeSeries2:
| Date | Values1 | Values2 |
| 2016-12-27 21:00:00 | 10.00 | 20.00 | | 2016-12-27 21:01:00 | 10.01 | 20.01 | | 2016-12-27 21:02:00 | 10.02 | NA | | 2016-12-27 21:03:00 | NA | 20.03 | | 2016-12-27 21:04:00 | 10.04 | NA | | 2016-12-27 21:05:00 | NA | 20.05 | | 2016-12-27 21:06:00 | NA | 20.06 | | 2016-12-27 21:07:00 | 10.07 | 20.07 |
Applying na.locf to the merged TimeSeries
| Date | Values1 | Values2 |
| 2016-12-27 21:00:00 | 10.00 | 20.00 | | 2016-12-27 21:01:00 | 10.01 | 20.01 | | 2016-12-27 21:02:00 | 10.02 | 20.01 | | 2016-12-27 21:03:00 | 10.02 | 20.03 | | 2016-12-27 21:04:00 | 10.04 | 20.03 | | 2016-12-27 21:05:00 | 10.04 | 20.05 | | 2016-12-27 21:06:00 | 10.04 | 20.06 | | 2016-12-27 21:07:00 | 10.07 | 20.07 |
Applying na.omit to the merged TimeSeries
| Date | Values1 | Values2 |
| 2016-12-27 21:00:00 | 10.00 | 20.00 | | 2016-12-27 21:01:00 | 10.01 | 20.01 | | 2016-12-27 21:07:00 | 10.07 | 20.07 |
Very Thanks
Danilo