Kaggle / kaggle-api

Official Kaggle API
Apache License 2.0
6.28k stars 1.1k forks source link

dataset-metadata.json fails to upload file and column descriptions #447

Open wschwab opened 2 years ago

wschwab commented 2 years ago

I've been trying to create an automated pipeline for updating data in a dataset I've created. I've been trying to use a dataset-metadata.json as the API docs say to. The description of the project gets updated if I change it in the json, but not the descriptions for the files and columns.

I can't figure out what I'm doing wrong - there don't seem to be any problems with the json syntax (evidenced by the project description updating properly), and I've followed the schema outlined in the docs as far as I can tell.

I originally used a json generated by kaggle datasets init -p /path/to/dataset, with similar results, and have since trimmed down a number of fields that I did not find documented anywhere else in order to check if they were causing the issues, but nothing has changed.

I've made a pastebin of the (current) json I'm using here: https://pastebin.com/exx65g6i, I'll also paste in a section at the bottom of this issue for quick reference.

If anyone is able to help me figure out what I'm doing wrong, it would be greatly appreciated. Thanks!

Sample From the JSON:

There are a bunch more files and columns in the dataset, I wanted to cut a section that shows the schema being used in mine for quick reference without burdening the reader with all the details. There is a link to a pastebin of the full json above.

{
  "id": "wschwab/0xa57bd00134b2850b2a1c55860c9e9ea100fdd6cf",
  "title": "0xa57bd00134b2850b2a1c55860c9e9ea100fdd6cf",
  "subtitle": "lorem ipsum",
  "description": "lorem ipsum",
  "keywords": [
    "data analytics",
    "currencies and foreign exchange"
  ],
  "licenses": [
    {
      "name": "CC0-1.0"
    }
  ],
  "resources": [
    {
      "path": "allLogs.csv",
      "description": "All events emitted by 0xa57's transactions, scraped using Trueblocks.",
      "schema": {
        "fields": [
          {
            "name": "blockNumber",
            "description": "the number of the block the event was emitted in",
            "type": "integer"
          },
          {
            "name": "transactionIndex",
            "description": "index of the transaction that emitted the event inside the array of transactions in the block",
            "type": "integer"
          },
          {
            "name": "logIndex",
            "description": "index of this log in the array of logs emitted in this block",
            "type": "integer"
          },
          {
            "name": "timestamp",
            "description": "UNIX timestamp of the block the event was emitted in",
            "type": "integer"
          },
          {
            "name": "address",
            "description": "the address that emitted the event",
            "type": "string"
          }
        ]
      }
    },
<-- rest of file truncated -->
wschwab commented 1 year ago

I am still experiencing this issue, and have begun wondering if there might be a bug where dataset-metadata.json doesn't work with certain filetypes, though that seems a stretch to me. Just in case it is useful, the dataset in question is in CSV (as can be seen from the lone file name in the snippet above, but I figured it was worth explicitly pointing at).

svaningelgem commented 1 year ago

Same here. I saw somewhere "data" was used, but that didn't solve the issue too. I feel it's a wrong description in the docs, or it's not being picked up by Kaggle. Changing the metadata through Kaggles website works fine. But retrieving that metadata-json file doesn't show the entered information.

So if any help can be given, I'd be grateful.

dennisangemi commented 1 year ago

Same problem. I thought I was doing something wrong (I followed frictionless data standards & kaggle docs), but there's probably a bug.

yonikremer commented 1 year ago

I have the same issue.

brunobastosg commented 1 year ago

I also have the same issue.

vitorStein commented 1 year ago

The error still persists, version 1.5.16. The workaround is to not send the resources key in the dataset-metadata.json. When the key is not found in the dataset-metadata.json, the information remains the same as already registered.