aidos-lab / pytorch-topological

A topological machine learning framework based on PyTorch
https://pytorch-topological.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
147 stars 21 forks source link

TypeError: 'PersistenceInformation' object is not iterable #16

Closed AdaUchendu closed 1 year ago

AdaUchendu commented 1 year ago

How can I handle non-iterable "PersistenceInformation' in the TopologicalModel classification model? The error comes from: pers_info = make_tensor(pers_info). I also tried torch.tensor(pers_info) and got the error that pers_info is not a sequence.

Finally, pers_info is the VR complex of weights in a torch.tensor. I am not sure if that matters.

Thank you.

Pseudomanifold commented 1 year ago

Hi! Can you please add some more information? What's the context of this error? What's your input data like?

AdaUchendu commented 1 year ago

The input data is a numerical representation of text documents, so the BERT weights of a document. I followed the example of the TopologicalModel classifier example that you have. The only thing is that instead of using data_set = SphereVsTorus(n_point_clouds=2 * batch_size), my data are weights gotten from BertTokenizer. Let me know if you need more information, I can share my code if you want.

Pseudomanifold commented 1 year ago

I think it makes sense to share the code. You could also check the shapes of the resulting tensors; is it possible that you feed in only a single data set? In that case, you might need to do torch.tensor([pers_info]).

AdaUchendu commented 1 year ago

I tried that and it didn't work. Here is a link to my code on Google colab: https://colab.research.google.com/drive/1L2oMDSmEKuQnaA9P5wLfjyaaDYWDhC2O?usp=sharing. The classifier is GeneratorClassifier.

Thank you

Pseudomanifold commented 1 year ago

Honestly, that's a lot of code for me to try to understand. Please help me understand this better and try to reduce the scope of the issue. What's the shape of output here?

AdaUchendu commented 1 year ago

That is fair. I apologize for the length of the code. There was no way for me to reduce the size. So the problem is 2-fold, the make_tensor function gives an error, however, when I just use the VR complex output as input for the Linear layer, I get a non-iterable error. So the output size for the VR complex input is: torch.Size([16, 768]) .

Thanks a lot for your swift responses and patience.

Pseudomanifold commented 1 year ago

No worries, just pointing out that a minimum viable example really helps in these cases...but we'll get there! So, output should be good to go for the VR complex. What you will get from this calculation is a set of persistence diagrams. I have now updated the logic for this (see referenced commits). If you update your torch-topological installation to v0.1.6, you will get these fixes.

One warning, though: you are creating a tensor with a single batch; is this what you want here? torch-topological can also handle different inputs, but I don't know your application well enough to comment on this.

I'd be interested in learning more about the intended application; if you find the package helpful, you may consider contributing a brief documentation or tutorial.

For now, I am closing this comment—please reopen if this specific issue persists!

AdaUchendu commented 1 year ago

Thank you. This might be a trivial issue but I keep getting this error: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument tensors in method wrapper_cat) from using the make_tensor function. The input is the VietorisRipsComplex(dim=0) output which is a list. Do you have any idea how I can rectify this issue?

Also, I honestly understand if you have no idea why this is happening and how to fix it.

Pseudomanifold commented 1 year ago

I'll have to do a proper fix tomorrow for this. In the meantime: can you send the output tensor temporarily to the GPU, for instance via pers_info.to(output.device)? Im currently travelling and don't have access to a GPU to check this. Will fix ASAP.

AdaUchendu commented 1 year ago

Thank you so much. Unfortunately pers_info.to(device) does not work because the output of VietorisRipsComplex(dim=0) is a list and all attempts to convert it to a torch.tensor() or numpy.array() gives the error not a sequence or can't convert object. I will wait for the correction. Good luck on your travels and thank you so much for your help and patience. I really appreciate it.

Pseudomanifold commented 1 year ago

I meant converting or moving it after you call make_tensor. It is my impression that the error is raised after that, probably when you try to make use of the converted tensor in a downstream operation. Is this correct? (additional stack traces and a more specific description of where the code fails would be very helpful here)

Pseudomanifold commented 1 year ago

Just fixed this device creation issue—please update again to v0.1.7. Sorry for the inconvenience; it's really hard tracking the various types of usage scenarios, and make_tensor is not a functionality that I have fully tested yet.

AdaUchendu commented 1 year ago

Thanks a lot and there is no need to apologize. The device error has been fixed and the function works beautifully and doesn't produce any errors. However, I have a question, I get the following error when I use the output of make_tensor as input for the Linear layer: mat1 and mat2 shapes cannot be multiplied (15x3 and 768x20). The input to VietorisRipsComplex(dim=0) is a tensor with torch.Size([16, 768]). However, the output after using make_tensor is a 15x3 tensor which is incompatible with the Linear layer. So my question is, is it possible to get a tensor with the same size or at least with 768 columns after taking the VR complex and make_tensor with size: torch.Size([16, 768])?

Thanks again. You really have made my TDA experience very nice.

Pseudomanifold commented 1 year ago

So my question is, is it possible to get a tensor with the same size or at least with 768 columns after taking the VR complex and make_tensor with size: torch.Size([16, 768])?

No, I'm afraid that's technically not possible: the Vietoris--Rips complex will calculate one persistence diagram of your data (since you are not using any batches), and there's no guarantee how big the resulting tensor will be---it will contain topological features indicated by creation, destruction, and dimension, respectively. Hence, if you want to use them for downstream tasks, the easiest way is to either use one of the layers (as in the classification.py example), or run the resulting tensor through a different representation (like a fully-connected layer, a set function, etc.).

Hope that helps!

AdaUchendu commented 1 year ago

Thank you. My batch size is 16 which is why the tensor size is torch.Size([16, 768]) which is the input for VietorisRipsComplex(dim=0). Do you think I need to loop through each row of the tensor and get the VR complex? Because since the batch size is 16, the desired output should be a tensor with 16 rows.

Pseudomanifold commented 1 year ago

OK, I understand. But in this case, you may be calculating something that you don't want: persistent homology treats your batch as 16 points in a 768-dimensional space, hence you only get a single persistence diagram. What's the overarching goal of the calculation?

AdaUchendu commented 1 year ago

The goal is to get the persistence diagram of each row in the tensor of torch.Size([16, 768]), so that the output will be 16 rows x N columns. Is this achievable? or should I change the batch size to 1?

Pseudomanifold commented 1 year ago

No, I'm afraid that this is not possible, because persistent homology is only defined on the level of metric spaces or point clouds. A 2D tensor like in your cases will always be treated as a point cloud. What's the overall goal of your code? It seems like it's about adding some kind of topological information to NLP embeddings?

AdaUchendu commented 1 year ago

Yes, my goal is to get the topological features of numerical representations (NLP embeddings) of each article. My hypothesis is language has a shape that is unique to each author. Author A's shape is different from Author's shape, based on the average shape of each author's articles.

Pseudomanifold commented 1 year ago

I think in this case, a viable way could be to try to map an article by an author into a point cloud, then get the persistence diagram as a shape descriptor. If you batch over articles or authors, you can then compare the corresponding representations. Sounds like a very exciting topic to me!

AdaUchendu commented 1 year ago

Thank you. Forgive me, this may be a basic question, but how do I map the NLP embeddings for each article into a point cloud? Is there a function that does that? I see this function SphereVsTorus(n_point_clouds=2 * batch_size) but I think it creates the point clouds.

Pseudomanifold commented 1 year ago

Hmm, I don't know what they look like, but aren't those embeddings basically tensors already? If so, you can define your own data set class to iterate over them (this is something pytorch generally requires).

AdaUchendu commented 1 year ago

Actually, I misunderstood your previous comment about mapping to point clouds. But I understand now. And yes, my input is a tensor. Thank you. Let me try that now. I will let you know if it works.

Thanks again for all the help. You really have made this process practically painless.

Pseudomanifold commented 1 year ago

Very kind of you :-) Feel free to reach out to discuss this in general!

AdaUchendu commented 1 year ago

It seems that the VietorisRipsComplex(dim=0) function can only take in a 2D tensor. And so iterating through each article, makes the input a 1D tensor. I am not sure of all the intricate details of calculating a VR complex, but does it make sense for me to pad the 1D tensor with another row of just ones with the same size? Will this affect the quality of the VR complex output?

Thank you

Pseudomanifold commented 1 year ago

Wait, maybe this is documented incorrectly, but the VR complex should also be able to handle a 3D tensor, with the first dimension going over batches.

AdaUchendu commented 1 year ago

So I tried padding the tensor row with another tensor of ones to create a 2D tensor. This worked and I was able to run the code without no errors. However, for some reason using pred = torch.argmax(output, dim=1), it only predicts one class/label out of 20 labels. Therefore, resulting in a 5% accuracy. Have you ever experienced an issue like this on a classification task?

Pseudomanifold commented 1 year ago

Yes,I think the problem is that the additional row does not really contribute salient information. Somewhere earlier in your pipeline, you should check the representation of your data. I believe that it should be possible to embed each document individually, getting vector for each word. If you don't aggregate over them, your can use them in this setting.

AdaUchendu commented 1 year ago

Thank you, Dr. Bastian. But the problem still remains, each article embedding is a tensor with size: torch.Size([1, 768]). However, this function VietorisRipsComplex(dim=0) throws this error: RuntimeError: cdist only supports at least 2D tensors, X1 got: 1D because I still need at least a 2D tensor. Do you know how I can get around this since adding the tensors of ones negatively impacted the results?

Pseudomanifold commented 1 year ago

I think this is a conceptual issue at this point: you should adjust your architecture such that it creates a point cloud (a 2D tensor) for each article (?) or author. Your present output is not in a form that can be used for persistent homology calculations; persistent homology works best when you can apply it to a setting where you study the evolution of one function over an object (such as a metric space or a point cloud). I'd suggest to take this discussion "offline" to e-mail because this is not an issue with the code any more. Happy to chat!

AdaUchendu commented 1 year ago

Thank you. What is your email address?

Pseudomanifold commented 1 year ago

Sure thing! bastian.rieck@helmholtz-muenchen.de

AdaUchendu commented 1 year ago

Hi Dr. Rieck, I sent you the email. Just checking that it did not go to junk.

Thanks a lot.