animikhroy / rk_toolkit_pipeline_diagrams

Master-repository for all code related to "A Novel Approach to Topological Graph Theory with R-K Diagrams and Gravitational Wave Analysis"
https://arxiv.org/abs/2201.06923
2 stars 0 forks source link

R-K diagram based classification diagram of compact binaries with t-sne or any other technique as shown below #8

Closed andorsk closed 1 year ago

andorsk commented 2 years ago

I believe you had attempted this initially as well but discussed about removing the redundant node clusters apart from mass, spin, q-ratio and redshift and redoing it with better spatial separation and distinction as shown in the diagram below. Everyone really appreciated this idea at Glasgow and gave their positive feedback and support so this will be the biggest validation of our application from the LIGO and compact binary POV for the physics community w.r.t the last section of our paper.

andorsk commented 2 years ago

Some issues which I am working though, 2 samples below: image image

Basic approach: take data. Embed it over TSNE. Plot it.

andorsk commented 2 years ago

Original:

image
animikhroy commented 1 year ago

I have made a doc to collect all the results, images and brief descriptions in one place with relevant comments wherever applicable for your consideration when you address this issue today. Here is the link to that results doc for your reference:

IMPORTANT TO NOTE

In the initial test diagram only black hole high mass predominates but the purpose of this plot is to address some distance separation between different compact binary mergers such as black holes, neutron stars and pbhs etc. to show distinction like shown in the sample diagram I had generated. That is one of the biggest validations of our novel approach to show some distinction via distance separation between classes and that is the POV the scientific community is looking forward to validate from us.

https://docs.google.com/document/d/13DjS3Y44cE9hJq6H403zvSlkI1o9uMNBhBhK9jaRGQ8/edit?usp=sharing

This google doc is solely focused on the updated results and descriptions for ligo_secondary_analysis.ipynb. I am putting all the issues in sequence and requesting brief descriptions of techniques and methodologies involved with relevant references to cite (if any) based on your work, for me to then elaborate further on those points in details and quickly wrap up the final draft of the paper for peer review.

andorsk commented 1 year ago

@animikhroy this comment doesn't really provide feedback here. Could you please provide more direct feedback based on the samples above?

There is an embedding section ( called Embedding ) in : https://github.com/animikhroy/rk_toolkit_pipeline_diagrams/blob/main/02_notebooks/rk_gw_mma/ligo_secondary_analysis.ipynb which discusses the method and provides the code in more detail.

As of now, the feedback sounds like it's fine as is

andorsk commented 1 year ago

Added event labels to the Diagram:

image

animikhroy commented 1 year ago

@andorsk there is no separation of categorical events in the initial diagrams that you provided all of them are just black holes of different mass.

What we need is to show different types of compact binaries separated by distance based on their class. i.e. there needs to be NS-NS, NS-BH and PBH-BH R-K diagrams in your plot like I had shown on my sample. All your events are just BH-BH events which is not the point of this exercise. Furthermore I do like the event label and classification label on the plots which is good, but have you noticed all of them say the exact same thing i.e. Black Hole (High Mass) and GW170809 so that is definitely a problem that needs to be addressed

andorsk commented 1 year ago

Feedback noted. Reviewing.

andorsk commented 1 year ago

Better spatial separation: 2 from each class:

image

PBH Far left. NS + BH Next to them. NS next. Then BH after.

andorsk commented 1 year ago

Another version:

image

Note the centers are color coded by class

andorsk commented 1 year ago

image

andorsk commented 1 year ago

@animikhroy please review the above and lmk if any of these are sufficient.

More:

  1. Class separation
  2. Spatial separation
  3. Better class representation ( more even distribution )
  4. 8 events.
andorsk commented 1 year ago

https://docs.google.com/document/d/13DjS3Y44cE9hJq6H403zvSlkI1o9uMNBhBhK9jaRGQ8/edit was updated as well

animikhroy commented 1 year ago

@andorsk This is truly impressive! I think the last iteration is perfect almost. However, I would like to use the previous steps as well to show how the study evolved. Could you just do 2 things?

1) Just give me notes on what you did in the previous steps as well which were different from the final step. The final step is dope but documenting the steps is important for the peer review paper.

2) Could you please add x and y axis labels to the plot and create 2 lines of separation on top of the final plot as shown in the diagram below? I have used your final diagram for clarity:

r-k embedding revised

andorsk commented 1 year ago
  1. @animikhroy could you clarify by "previous steps"?
  2. re: the lines, not sure what you're going for here, but seems like this would be a lot easier to post process ( you already did it ) than do it programmatically. What labels do you want for the XY labels?
animikhroy commented 1 year ago

@andorsk the classification lines were in incorrect in my previous diagram so I have updated them now. It shows correctly with distinct categorical classifications as planned so I want the final diagram to look like that one along with axis labels. So for the axis labels should represent similarity scores normalized between 0 and 1 for both both the axis right? That would also be correct according to our definition of R-K Distance and the following sections.

animikhroy commented 1 year ago
  1. @animikhroy could you clarify by "previous steps"?

For this I just mean the 2 previous steps you attempted. They generated 2 plots that were successively closer to the final target plot you generated. So what did these plots lack in the first 2 steps is what I need for documentation in our paper.

andorsk commented 1 year ago

@animikhroy i just made some parameter updates. and no @animikhroy that's not how the axis works in an embedding. An embeddings is different than a similarity score.

animikhroy commented 1 year ago

@andorsk I understand the difference in principle but having the 2 axis labels normalized with a range of values from 0-1 on both x & y is best I could think of. Do you have any other suggestion for t-sne axis labels because GPS Times or Frequency or SNR would make no sense here.

animikhroy commented 1 year ago

@andorsk Ok here is my alternate suggestion : To label the axes, I recommend writing something like "t-SNE dimension 1" and "t-SNE dimension 2" for x & y respectively.

andorsk commented 1 year ago

Yea. That would be fine.

animikhroy commented 1 year ago

Yea. That would be fine.

@andorsk and for the range of the axes the better suggestion is to follow this wiki diagram which goes from -10 to 0 to +10 on both x & y. I think that would be better than normalizing between 0 and 1 since these are pair-wise embeddings so a flat 0-1 would make less sense!

Here is the Link FYI: https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding#/media/File:T-SNE_visualisation_of_word_embeddings_generated_using_19th_century_literature.png

andorsk commented 1 year ago

@animikhroy please confirm. Then I'd like to close this ticket as complete image

andorsk commented 1 year ago

The lines should be done post process. unless you want to give me the exact slope and origin you used.

animikhroy commented 1 year ago

@andorsk yes this looks great apart from the lines of separation. Wait let me check regarding the slope and get back asap!

andorsk commented 1 year ago

image I made a pass through trial and eerror

andorsk commented 1 year ago

@animikhroy

andorsk commented 1 year ago

Let me know if I can close this ticket now

andorsk commented 1 year ago

image

andorsk commented 1 year ago

label fix. above

animikhroy commented 1 year ago

@andorsk okay cool this is perfect I was just about to send you coordinated angle of inclination and slop but you did everything perfectly! So before we close this issue I just wanted to confirm if you were treating the PCA SVM diagram a part of this issue or that was for the ROC AUC issue?

https://docs.google.com/document/d/13DjS3Y44cE9hJq6H403zvSlkI1o9uMNBhBhK9jaRGQ8/edit

Just refer to point 3) on this google doc link and confirm once! Everything else is perfect for this issue!

andorsk commented 1 year ago

This issue was specifically addressing request for 3) R-K diagram based classification diagram of compact binaries with t-sne or any other technique as shown below. Addressing ROC/AUC is a different issue and is designated as such.

animikhroy commented 1 year ago

@andorsk thanks for the confirmation