Knewton / edm2016

Code for replicating results in our EDM2016 paper
Apache License 2.0
59 stars 32 forks source link

Key error: Hashing function? #8

Closed yeukyul closed 7 years ago

yeukyul commented 7 years ago

I am just wondering what are the format requirement of Problem Id and Step Name column. I have tried to reuse the code for a research I am doing, but there would always be a key error when I tried to run the code on my data, unless I use the exact same problem id as the one in the KDD Cup dataset.

Any help will be appreciated.

khwilson commented 7 years ago

Each data set has a different way of being processed. All the data processing is handled by scripts in the data package. The column names for the KDD Cup dataset are at the top of this file: https://github.com/Knewton/edm2016/blob/master/rnn_prof/data/kddcup.py

In your case, I think you need Problem Name and Step Name instead of Problem Id and Step Name.

But in the end, try looking at the output of one of the load_data functions on, say, the KDD data set. (Just put a breakpoint into the code or dump the data to a CSV.) All the training algorithms expect the data in that format, so you can write your parser to reflect that format.

yeukyul commented 7 years ago

Thanks!