Navidfoumani / ConvTran

This is a PyTorch implementation of ConvTran
MIT License
130 stars 7 forks source link

There are some problems in Ford StayAlert dataset related doc and code #1

Closed linjianfeng closed 1 year ago

linjianfeng commented 1 year ago

I downloaded Ford StayAlert challenge data according to https://github.com/Navidfoumani/ConvTran/blob/main/Dataset/Segmentation/Segmentation.Txt. The test csv file looks like

image

There are some problems related to this dataset:

  1. the label column named 'IsAlert' is filled with '?', we cannot test with it due to missing real label
  2. the function load_ford_data in data_loader.py fails due to it try to access non-existed column name 'series' and 'label' (should be 'TrialID' and 'IsAlert'?)
  3. the following code in Dataset/load_segment_data.py try to rearrange data matrix from (sample, window_len, channel) to (sample, channel, window_len) with Numpy reshape method, I think this is wrong because reshape is just to simply re-segment the items, not to transpose the matrix. So that vectors in Data['train_data'] are ill aligned.

Data['train_data'] = X_train.reshape(X_train.shape[0], X_train.shape[2], X_train.shape[1])

So do you have another version of Ford dataset? And if the algorithm got good score with the ill aligned dataset, maybe it could achieve better performance with rectified code?

Navidfoumani commented 1 year ago

I apologize for the problems you encountered with the Ford StayAlert dataset documentation and code. I appreciate you bringing this issue to our attention. I have made the necessary updates to the code, and I kindly request that you re-download it (or please replace the existing util.py file in your project with the updated version)

Ford dataset: Access the dataset from the following Kaggle competition link: https://www.kaggle.com/competitions/stayalert/data. Download the "stayalert.zip", which contains the following files: Solution.csv fordTrain.csv fordTest.csv

Labeling the test data: The file fordTest.csv does not have labels. To assign labels to the test data, follow these steps: Open the Solution.csv file. Copy the contents of the prediction columns. Paste the copied prediction values into the "ISAlert" column of the fordTest.csv file. Renaming and copying files:

Rename the fordTrain.csv and fordTest.csv files to FordChallenge_Train.csv and FordChallenge_Test.csv, respectively. Copy the FordChallenge_Train.csv and FordChallenge_Test.csv files to the following directory: Datasets/Segmentation/FordChallenge. Column renaming:

Open the FordChallenge_Train.csv and FordChallenge_Test.csv files. Rename the following columns: "TrialID" to "series" "obsNum" to "timestamp" "IsAlert" to "label"

Finally: Copy the FordChallenge_TEST.csv and FordChallenge_Train.csv to: Datasets/Segmentation/FordChallenge