datahubio / datahub-v2-pm

Project management (issues only)
8 stars 2 forks source link

Pushing files with extension .arff bug #243

Closed Branko-Dj closed 5 years ago

Branko-Dj commented 5 years ago

When uploading files to datahub, files with an extension .arff are not processed. Examples of arff files can be found at https://www.openml.org/search?type=data

Acceptance criteria

Tasks

Analysis

TODO

zelima commented 5 years ago

@Branko-Dj I'd like to see examples from datahub or GitHub (processed ones). Where can I find them?

zelima commented 5 years ago

Here's one of them https://api.datahub.io/source/machine-learning/arrhythmia/successful

zelima commented 5 years ago

OK, this took me a while, but finally got it. @Branko-Dj The reasons they are not getting there are:

  1. There is no .arff resource added to the resource list in dp.json Eg: This one has only 1 resource init and it's CSV https://api.datahub.io/source/machine-learning/hepatitis/successful 1.1. Also aside, but something wrong with resource path
  2. The is .arff resource in resouce list, but it has the same exact name as the CSV resource. In the background it's just getting overwritten. Eg https://api.datahub.io/source/machine-learning/arrhythmia/successful

So the solution is the following:

  1. Add the .arff to the resource list
  2. rename resource to Eg arrhythmia_arff or similar
Branko-Dj commented 5 years ago

FIXED Arff resources were renamed with _arff added to the file name. After reinitializing the datapackage and repushing to datahub the issue was resolved