Put Pipeline Runs in D3M MtL DB

byu-dml / d3m-experimenter

A distributed system for creating, running, and persisting many machine learning experiments.

0 stars 0 forks source link

Put Pipeline Runs in D3M MtL DB #38

Closed epeters3 closed 4 years ago

epeters3 commented 5 years ago

It would be great to have all of d3m-experimenter's rich data inside the D3M MtL DB. The D3M deps will need to be updated (i.e. the D3M docker image that's used by the repo), and any breaking changes introduced by the updated image will need to be resolved.

Also, it may be good to have a utility or script for submitting the d3m-experimenter pipeline runs, pipelines, and (problems?) to the D3M MtL DB in a safe way (i.e. with checking to make sure no duplicate documents are inserted into the DB).

epeters3 commented 5 years ago

A good way to check if a pipeline is already in the DB, if the DB pipeline POST endpoint doesn't already have a checker, is to have a Set of all the pipelines in the DB (since d3m.metadata.pipeline.Pipeline objects are hashable), hash the candidate pipeline, and check if it exists already in the set. If the hash version of a pipeline includes its digest/id, then an O(n) sweep through the pipelines checking with the d3m.metadata.pipeline.Pipeline.equals method might be necessary, since that checks equality in terms of isomorphism.

epeters3 commented 5 years ago

Is this document related to submitting documents to the D3M MtL DB? https://datadrivendiscovery.org/wiki/display/work/Evaluation+Workflow

epeters3 commented 4 years ago

Closed by e9d29ca576eac00456e4a51227ae04e95fb121e0.