using catvae without a phylogeny

flatironinstitute / catvae

Categorical Variational Autoencoders

BSD 3-Clause "New" or "Revised" License

22 stars 3 forks source link

using catvae without a phylogeny #75

Closed cameronmartino closed 2 years ago

cameronmartino commented 2 years ago

Hi @mortonjt,

Is it okay to train a catvae model without a phylogeny? If so, how should I extract the embeddings index IDs from the model output? I see all the utils rely on the phylogeny for the index order.

Thanks!

mortonjt commented 2 years ago

Hi! Yes you don’t need to specify a phylogeny - if none is specified , a random phylogeny will be generated.

https://github.com/flatironinstitute/catvae/blob/master/catvae/models/linear_vae.py#L16

It wont really matter which tree you use for estimation - but it can help with interpretation.

On Thu, May 5, 2022 at 3:56 PM Cameron Martino @.***> wrote:

Hi @mortonjt https://github.com/mortonjt,

Is it okay to train a catvae model without a phylogeny? If so, how should I extract the embeddings index IDs from the model output? I see all the utils rely on the phylogeny for the index order.

Thanks!

— Reply to this email directly, view it on GitHub https://github.com/flatironinstitute/catvae/issues/75, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA75VXMY2FKDT4HR7UG6URDVIQRYPANCNFSM5VGDMIJQ . You are receiving this because you were mentioned.Message ID: @.***>

cameronmartino commented 2 years ago

@mortonjt For the order of the index IDs without the tree in the embedding is it just the order of the IDs in the biom tables? or ALR first or last IDs? so bt.ids('observation')[1:] or bt.ids('observation')[:-1]?

Thanks!!

cameronmartino commented 2 years ago

Maybe to be more explicit. When I do:

W = model.vae.decoder.weight.detach().cpu().numpy()
dist_W = squareform(pdist(W))
dist_Wdf = pd.DataFrame(dist_W)

How would I grab the correct IDs order for dist_Wdf and am I calculating dist_Wdf correctly?

Thanks!!

mortonjt commented 2 years ago

See the Readme on extracting embeddings. I have a helper function to do this

https://github.com/flatironinstitute/catvae/blob/master/catvae/util.py#L70

On Thu, May 5, 2022 at 6:02 PM Cameron Martino @.***> wrote:

Maybe to be more explicit. When I do:

W = model.vae.decoder.weight.detach().cpu().numpy()dist_W = squareform(pdist(W))dist_Wdf = pd.DataFrame(dist_W)

How would I grab the correct IDs order for dist_Wdf and am I calculating dist_Wdf correctly?

Thanks!!

— Reply to this email directly, view it on GitHub https://github.com/flatironinstitute/catvae/issues/75#issuecomment-1119085435, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA75VXJ7HC3IEZNDQYEQY33VIRAOPANCNFSM5VGDMIJQ . You are receiving this because you were mentioned.Message ID: @.***>

cameronmartino commented 2 years ago

Right, but those require a basis/phylogeny tree as input in order to get the correct order of the IDs. How should I do it without one?

mortonjt commented 2 years ago

Got it. Ok, don’t use the defaults. Id use the random_linkage function (used in that get_basis method) to get you a random phylogeny - which you can use the helper methods listed previously

On Thu, May 5, 2022 at 8:14 PM Cameron Martino @.***> wrote:

Right, but those require a basis/phylogeny tree as input in order to get the correct order of the IDs. How should I do it without one?

— Reply to this email directly, view it on GitHub https://github.com/flatironinstitute/catvae/issues/75#issuecomment-1119155624, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA75VXKHUPZOQJSM26AWP2LVIRP5VANCNFSM5VGDMIJQ . You are receiving this because you were mentioned.Message ID: @.***>

cameronmartino commented 2 years ago

That worked. Thanks!!

cameronmartino commented 2 years ago

FYI - it only really works if you make the tree ahead of time and plug it in (otherwise it won't match). Might be good to expose the tree in the model class (i.e. self.tree) so one could pull it out after if needed.