Closed rszeto closed 4 years ago
Just sent an email to Swaroop Vattam, who seems to be the main person managing the datasets.
We should actually be using the private dataset repo per Swaroop's response:
I’m afraid you are looking at the public one, which is the wrong one. Evaluations always use our internal repo. With the internal one, there are many directories:
Among these, the datasets in seed_datasets_current are always evaluated. Depending on the objectives of the evaluation, other dirs may or may not be included. For instance, if DARPA is interested in evaluating data-augmentation, then the datasets under seed_datasets_data_augmentation becomes relevant.
Below is his response about migrating the clustering dataset. Not relevant since we need to use the private repo anyways, but may be useful later on.
As far as 1491_one_hundred_plants_margin_clust is concerned, because it’s an unsupervised task, it’s kept under seed_datasets_unsupervised. I had not migrated it because I was not sure if any team was using it. I will migrate it soon.
Depends on #34
Moved to private dataset as per #34. Removing depends-on-issue tag.
Adding in d040be3fc7070669a081df53d9e5117a2349234d
The current clustering pipelines operate on the wrong dataset, one_hundred_plants_margin (a classification dataset). They must be renamed and changed to run on the one_hundred_plants_margin_clust dataset, which is designed for clustering.
This change must be implemented after D3M transfers one_hundred_plants_margin_clust to the public datasets repo (if there are any plans to do so).