boathit / t2vec

t2vec: Deep Representation Learning for Trajectory Similarity Computation
133 stars 47 forks source link

Mapping from porto.h5 to trj.h5 #7

Open ZrrrKIT opened 5 years ago

ZrrrKIT commented 5 years ago

Dear @boathit,

I am currently trying to find out how the raw GPS dataset porto.h5 can be mapped to the vector representation trj.h5.

porto.h5 contains 1 704 759 raw GPS-trajectories, whereas trj.t stores only 101 000 sequences of hot cells. The vector representation in trj.h5 also hold 101 000 values, leading me to believe they are the corresponding embeddings from trj.t.

Is there some way to find out which trajectories from porto.h5 correspond to the ones in trj.t? (For example: the first 101 000 from the 1 704 759.)

Thank you in advance, ZrrrKIT

boathit commented 5 years ago

That is easy. You can call trip2seq function to transform all trips in porto.h5 to sequences, and save them into trj.t, just like this line.

ZrrrKIT commented 5 years ago

This works perfectly, thank you!

The only thing not clear to me is how to interpret the values, which are stored in the trj.t file itself. I see that they range from 1 to 18866, which means they are the indices of the hot cells (a.k.a. the vocabulary IDs). The problem is that I am not sure how to infer their positions on the grid. For normal cell :: Int values we just take mod of the grid width to obtain its x coordinate and the div of the width to obtain the y coordinate.

The problem here is that we are dealing with vocab_IDs. Is there a way to infer the 2D location of the hot cells from their IDs?

boathit commented 5 years ago

You can you use either cell2gps to get the centroid gps of the cell or seq2trip to transform a sequence of cells into their gps locations.