Closed cameronmartino closed 2 years ago
Hi! Yes you don’t need to specify a phylogeny - if none is specified , a random phylogeny will be generated.
https://github.com/flatironinstitute/catvae/blob/master/catvae/models/linear_vae.py#L16
It wont really matter which tree you use for estimation - but it can help with interpretation.
On Thu, May 5, 2022 at 3:56 PM Cameron Martino @.***> wrote:
Hi @mortonjt https://github.com/mortonjt,
Is it okay to train a catvae model without a phylogeny? If so, how should I extract the embeddings index IDs from the model output? I see all the utils rely on the phylogeny for the index order.
Thanks!
— Reply to this email directly, view it on GitHub https://github.com/flatironinstitute/catvae/issues/75, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA75VXMY2FKDT4HR7UG6URDVIQRYPANCNFSM5VGDMIJQ . You are receiving this because you were mentioned.Message ID: @.***>
@mortonjt For the order of the index IDs without the tree in the embedding is it just the order of the IDs in the biom tables? or ALR first or last IDs? so bt.ids('observation')[1:]
or bt.ids('observation')[:-1]
?
Thanks!!
Maybe to be more explicit. When I do:
W = model.vae.decoder.weight.detach().cpu().numpy()
dist_W = squareform(pdist(W))
dist_Wdf = pd.DataFrame(dist_W)
How would I grab the correct IDs order for dist_Wdf
and am I calculating dist_Wdf
correctly?
Thanks!!
See the Readme on extracting embeddings. I have a helper function to do this
https://github.com/flatironinstitute/catvae/blob/master/catvae/util.py#L70
On Thu, May 5, 2022 at 6:02 PM Cameron Martino @.***> wrote:
Maybe to be more explicit. When I do:
W = model.vae.decoder.weight.detach().cpu().numpy()dist_W = squareform(pdist(W))dist_Wdf = pd.DataFrame(dist_W)
How would I grab the correct IDs order for dist_Wdf and am I calculating dist_Wdf correctly?
Thanks!!
— Reply to this email directly, view it on GitHub https://github.com/flatironinstitute/catvae/issues/75#issuecomment-1119085435, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA75VXJ7HC3IEZNDQYEQY33VIRAOPANCNFSM5VGDMIJQ . You are receiving this because you were mentioned.Message ID: @.***>
Right, but those require a basis/phylogeny tree
as input in order to get the correct order of the IDs. How should I do it without one?
Got it. Ok, don’t use the defaults. Id use the random_linkage function (used in that get_basis method) to get you a random phylogeny - which you can use the helper methods listed previously
On Thu, May 5, 2022 at 8:14 PM Cameron Martino @.***> wrote:
Right, but those require a basis/phylogeny tree as input in order to get the correct order of the IDs. How should I do it without one?
— Reply to this email directly, view it on GitHub https://github.com/flatironinstitute/catvae/issues/75#issuecomment-1119155624, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA75VXKHUPZOQJSM26AWP2LVIRP5VANCNFSM5VGDMIJQ . You are receiving this because you were mentioned.Message ID: @.***>
That worked. Thanks!!
FYI - it only really works if you make the tree ahead of time and plug it in (otherwise it won't match). Might be good to expose the tree in the model class (i.e. self.tree) so one could pull it out after if needed.
Hi @mortonjt,
Is it okay to train a catvae model without a phylogeny? If so, how should I extract the embeddings index IDs from the model output? I see all the utils rely on the phylogeny for the index order.
Thanks!