Rochester-Biomedical-DS / Hackathon-Summer-2020

Data and details for University of Rochester Biomedical Data Science Hackathon
GNU General Public License v3.0
1 stars 20 forks source link

Missing subject_ids in training and test set #1

Closed Yangxin666 closed 4 years ago

Yangxin666 commented 4 years ago

I found subject id 134, 215, 219 are missing in the training set. These three ids appear in "severity_score_train.txt".
Similarly, subject id 167 and 113 are missing in test set but appear in "prediction.csv". Did anyone find the same issue?

HounerX commented 4 years ago

yep

mccallm commented 4 years ago

This is as intended. Think about what you might be able to learn from the training data when you have a severity score but no additional information. Also, think about how you would make a prediction if all of the measured predictors were missing.

LeonShangguan commented 4 years ago

This is as intended. Think about what you might be able to learn from the training data when you have a severity score but no additional information. Also, think about how you would make a prediction if all of the measured predictors were missing.

Well, I think its ok for some missing data in trainset. But, for test, this means you need to predict without knowing any knowledge. We need to predict id 167 & 113 without any information about it, i.e. we can only guess. If so, why not add other ids, like 168, 169, 170 etc, we also know no information for those ids.