Open ZrrrKIT opened 5 years ago
That is easy. You can call trip2seq
function to transform all trips in porto.h5
to sequences, and save them into trj.t
, just like this line.
This works perfectly, thank you!
The only thing not clear to me is how to interpret the values, which are stored in the trj.t
file itself. I see that they range from 1 to 18866, which means they are the indices of the hot cells (a.k.a. the vocabulary IDs). The problem is that I am not sure how to infer their positions on the grid. For normal cell :: Int
values we just take mod
of the grid width to obtain its x
coordinate and the div
of the width to obtain the y
coordinate.
The problem here is that we are dealing with vocab_IDs. Is there a way to infer the 2D location of the hot cells from their IDs?
Dear @boathit,
I am currently trying to find out how the raw GPS dataset porto.h5 can be mapped to the vector representation trj.h5.
porto.h5 contains 1 704 759 raw GPS-trajectories, whereas trj.t stores only 101 000 sequences of hot cells. The vector representation in trj.h5 also hold 101 000 values, leading me to believe they are the corresponding embeddings from trj.t.
Is there some way to find out which trajectories from porto.h5 correspond to the ones in trj.t? (For example: the first 101 000 from the 1 704 759.)
Thank you in advance, ZrrrKIT