MLBazaar / mit-d3m

MIT tools to work with datasets in the D3M format.
MIT License
8 stars 7 forks source link

Error using load_d3mds with 31_urbansound #10

Closed micahjsmith closed 4 years ago

micahjsmith commented 5 years ago

Description

load_d3mds fails on the specific dataset 31_urbansound, I am not sure if this is a bug with mit_d3m library or if the dataset on S3 has been uploaded improperly.

What I Did

In [1]: !ls -l
total 0

In [2]: from mit_d3m import load_d3mds

In [3]: d = load_d3mds('31_urbansound')
Downloading dataset 31_urbansound
Getting file datasets/31_urbansound.tar.gz from S3 bucket d3m-data-dai
Extracting data/31_urbansound.tar.gz
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-3-581ccc7b6545> in <module>
----> 1 d = load_d3mds('31_urbansound')

~/miniconda3/envs/mitd3m/lib/python3.6/site-packages/mit_d3m/__init__.py in load_d3mds(dataset, root, force_download)
     55     problem_path = os.path.join(phase_root, 'problem_TRAIN')
     56 
---> 57     return D3MDS(dataset=dataset_path, problem=problem_path)
     58 
     59 

~/miniconda3/envs/mitd3m/lib/python3.6/site-packages/mit_d3m/dataset.py in __init__(self, dataset, problem)
    256             self.dataset = dataset
    257         else:
--> 258             self.dataset = D3MDataset(dataset)
    259 
    260         if isinstance(problem, D3MProblem):

~/miniconda3/envs/mitd3m/lib/python3.6/site-packages/mit_d3m/dataset.py in __init__(self, dataset)
     60             _dsDoc = dataset
     61 
---> 62         assert os.path.exists(_dsDoc), _dsDoc
     63         with open(_dsDoc, 'r') as f:
     64             self.dsDoc = json.load(f)

AssertionError: data/31_urbansound/TRAIN/dataset_TRAIN
micahjsmith commented 5 years ago

Note the created directory is nonempty, it just doesn't have TRAIN at that expected path:

(mitd3m) ubuntu:~/sandbox$ tree -L 6
.
└── data
    ├── 31_urbansound.tar.gz
    └── datasets
        └── special
            └── 31_urbansound
                ├── 31_urbansound_dataset
                │   ├── datasetDoc.json
                │   ├── media
                │   └── tables
                ├── 31_urbansound_problem
                │   ├── dataSplits.csv
                │   └── problemDoc.json
                ├── SCORE
                │   ├── dataset_TEST
                │   ├── problem_TEST
                │   └── targets.csv
                ├── TEST
                │   ├── dataset_TEST
                │   └── problem_TEST
                └── TRAIN
                    ├── dataset_TRAIN
                    └── problem_TRAIN
micahjsmith commented 5 years ago

@csala is this a bug with mit-d3m or is this a problem with how the tarfile was created?

csala commented 5 years ago

I'm afraid it is an error on the dataset creation. Those datasets/special levels should not be there.

This dataset was created manually, on its own, because of it being far bigger than all the other ones in the repository (hence the special part of the path), and possibly I made something wrong when I created it.

csala commented 5 years ago

I can confirm the point above, and I am currently generating and uploading a new tar.gz with the right structure. I'll close this once I have checked that it works.

micahjsmith commented 4 years ago

@csala was this fixed?

micahjsmith commented 4 years ago

this appears to have been fixed