LoLei / spmf-py

Python SPMF Wrapper 🐍 🎁
GNU General Public License v3.0
63 stars 18 forks source link

How to handle datasets with timestamps for algorithms that involve time constraints? #3

Closed dmeoli closed 3 years ago

dmeoli commented 3 years ago

I'm trying to run an instance of the HirateYamana with time constraints. In which format I should encode the dataset to involve the timestamp value for each subsequence?

e.g.

dataset = [

sequence: list of events

[(1, ['a']), (2, ['a', 'b', 'c']), (3, ['a', 'c']), (4, ['c'])],  # event: (timestamp : [list of item])
[(1, ['a']), (2, ['c']), (3, ['b', 'c'])], 
[(1, ['a', 'b']), (2, ['d']), (3, ['c']), (4, ['b']), (5, ['c'])], 
[(1, ['a']), (2, ['c']), (3, ['b']), (4, ['c'])]

]

LoLei commented 3 years ago

The link you posted uses an input file with a format like this (contextSequencesTimeExtended.txt):

<0> 1 -1 <1> 1 2 3 -1 <2> 1 3 -1 -2
<0> 1 -1 <1> 1 2 -1 <2> 1 2 3 -1 <3> 1 2 3 -1 -2
<0> 1 2 -1 <1> 1 2 -1 -2
<0> 2 -1 <1> 1 2 3 -1 -2

You could either (programmatically) reformat and write your data into a file like this as well, or do the same but put it into a multiline string as shown in the examples.