Closed yeukyul closed 7 years ago
Each data set has a different way of being processed. All the data processing is handled by scripts in the data
package. The column names for the KDD Cup dataset are at the top of this file: https://github.com/Knewton/edm2016/blob/master/rnn_prof/data/kddcup.py
In your case, I think you need Problem Name
and Step Name
instead of Problem Id
and Step Name
.
But in the end, try looking at the output of one of the load_data
functions on, say, the KDD data set. (Just put a breakpoint into the code or dump the data to a CSV.) All the training algorithms expect the data in that format, so you can write your parser to reflect that format.
Thanks!
I am just wondering what are the format requirement of
Problem Id
andStep Name
column. I have tried to reuse the code for a research I am doing, but there would always be a key error when I tried to run the code on my data, unless I use the exact sameproblem id
as the one in the KDD Cup dataset.Any help will be appreciated.