dvdmjohnson / d3m_michigan_primitives

Contains primitives developed by the University of Michigan team as part of the Data Driven Discovery (D3M) project
Other
2 stars 0 forks source link

Evaluate clustering pipelines on one_hundred_plants_margin_clust #31

Closed rszeto closed 4 years ago

rszeto commented 4 years ago

The current clustering pipelines operate on the wrong dataset, one_hundred_plants_margin (a classification dataset). They must be renamed and changed to run on the one_hundred_plants_margin_clust dataset, which is designed for clustering.

This change must be implemented after D3M transfers one_hundred_plants_margin_clust to the public datasets repo (if there are any plans to do so).

rszeto commented 4 years ago

Just sent an email to Swaroop Vattam, who seems to be the main person managing the datasets.

rszeto commented 4 years ago

We should actually be using the private dataset repo per Swaroop's response:

I’m afraid you are looking at the public one, which is the wrong one. Evaluations always use our internal repo. With the internal one, there are many directories:

Among these, the datasets in seed_datasets_current are always evaluated. Depending on the objectives of the evaluation, other dirs may or may not be included. For instance, if DARPA is interested in evaluating data-augmentation, then the datasets under seed_datasets_data_augmentation becomes relevant.

Below is his response about migrating the clustering dataset. Not relevant since we need to use the private repo anyways, but may be useful later on.

As far as 1491_one_hundred_plants_margin_clust is concerned, because it’s an unsupervised task, it’s kept under seed_datasets_unsupervised. I had not migrated it because I was not sure if any team was using it. I will migrate it soon.

rszeto commented 4 years ago

Depends on #34

rszeto commented 4 years ago

Moved to private dataset as per #34. Removing depends-on-issue tag.

rszeto commented 4 years ago

Adding in d040be3fc7070669a081df53d9e5117a2349234d