Kaggle / kaggle-api

Official Kaggle API
Apache License 2.0
6.28k stars 1.1k forks source link

"data" section of dataset-metadata.json empty in recent requests? #404

Open khughitt opened 2 years ago

khughitt commented 2 years ago

Greetings!

Has anything changed in recent months with respect to how dataset-metadata.json gets populated?

For example, when pulling metadata for the Hear Disease UCI dataset, using the kaggle API CLI (v1.5.12):

kaggle datasets metadata "ronitf/heart-disease-uci"

The resulting dataset-metadata.json file has an empty "data" block:

{
  "id": "ronitf/heart-disease-uci",
  "id_no": 33180,
  "datasetId": 33180,
  "datasetSlug": "heart-disease-uci",
  "ownerUser": "ronitf",
  ...
  "collaborators": [],
  "data": []
}

Previously, however (~March 2021), the same command / dataset produced a dataset-metadata.json file with some useful information, including expected filesize and column descriptors:

{
  "datasetId": 33180,
  "datasetSlug": "heart-disease-uci",
  "ownerUser": "ronitf",
 ...
  "data": [
    {
      "description": null,
      "name": "heart.csv",
      "totalBytes": 11328,
      "columns": [
        {
          "name": "age",
          "description": "age in years ",
          "type": "Uuid"
        },
        {
          "name": "sex",
          "description": "(1 = male; 0 = female) ",
          "type": "Uuid"
        },
        {
          "name": "cp",
          "description": "chest pain type ",
          "type": "Uuid"
        },
        {
          "name": "trestbps",
          "description": "resting blood pressure (in mm Hg on admission to the hospital) ",
          "type": "Uuid"
        },
        {
          "name": "chol",
          "description": "serum cholestoral in mg/dl ",
          "type": "Uuid"
        },
        {
          "name": "fbs",
          "description": "(fasting blood sugar > 120 mg/dl) (1 = true; 0 = false) ",
          "type": "Uuid"
        },
        {
          "name": "restecg",
          "description": "resting electrocardiographic results ",
          "type": "Uuid"
        },
        {
          "name": "thalach",
          "description": "maximum heart rate achieved ",
          "type": "Uuid"
        },
        {
          "name": "exang",
          "description": "exercise induced angina (1 = yes; 0 = no) ",
          "type": "Uuid"
        },
        {
          "name": "oldpeak",
          "description": "ST depression induced by exercise relative to rest ",
          "type": "Uuid"
        },
        {
          "name": "slope",
          "description": "the slope of the peak exercise ST segment ",
          "type": "Uuid"
        },
        {
          "name": "ca",
          "description": "number of major vessels (0-3) colored by flourosopy ",
          "type": "Uuid"
        },
        {
          "name": "thal",
          "description": " 3 = normal; 6 = fixed defect; 7 = reversable defect ",
          "type": "Uuid"
        },
        {
          "name": "target",
          "description": "1 or 0 ",
          "type": "Uuid"
        }
      ]
    }
  ]
}

As far as I can tell, nothing about the dataset itself has changed though.

There is no versioning information listed for the metadata section of the dataset page, so I can't be sure if that has changed.

Has anything changed with respect to how the metadata is generated? And if so, is there any way to retrieve the same information using the current API?

Thanks!