arjunjauhari / quora-ques

1 stars 1 forks source link

Define target (test data + metric) #3

Open arjunjauhari opened 6 years ago

arjunjauhari commented 6 years ago

1) Split data (train/dev/test) 2) Define a metric (code it)

naik-amey commented 6 years ago

Metric:

  1. The accuracy of prediction. Prediction being if the pair is duplicate. I think this is the only metric.
naik-amey commented 6 years ago

Summary: Total number of question pairs for training: 404290 Duplicate pairs: 36.92% Total number of questions in the training data: 537933 Number of questions that appear multiple times: 111780 Total number of question pairs for testing: 2345796

naik-amey commented 6 years ago

Metrics to evaluate:

  1. accuracy
  2. confusion matrix
  3. precision
  4. recall
  5. sensitivity
  6. specificity
  7. F1 score
  8. ROC
  9. AUC
naik-amey commented 6 years ago

Step 1: Extract a feature Step 2: Apply logistic regression.