Open epeters3 opened 4 years ago
What are your thoughts @bjschoenfeld?
We want embeddings from the pipeline encoding portions of the deep learning models. I have not looked at the code here for a while, so I am not sure if adding a method to dna.models.base_models.PyTorchModelBase
will take care of it all.
Just to clarify, we don't want embeddings that include information about the dataset or score? Just the pipelines?
Yes, let's start with pipeline embeddings that don't include dataset information. I am not sure how we would exclude the score information. That is the only thing used to train the networks. We could come up with an unsupervised method, but things are not setup for that yet.
In our team meeting last week, we decided that a good approach for this would be to have a separate CLI that will load a model's weights, then create embeddings for a dataset you point it to, using the model initialized with those weights. It should compute just on the pipeline embedding portion of the dataset.
This is the feature request for saving off the embeddings of the metamodels. Here is a list of all the deep learning metamodels:
❔ What is a good way to perform the saving of the embeddings? One option would be to add a
--save-embeddings
flag to thedna evaluate
CLI.