Closed andorsk closed 1 year ago
Some issues which I am working though, 2 samples below:
Basic approach: take data. Embed it over TSNE. Plot it.
Original:
I have made a doc to collect all the results, images and brief descriptions in one place with relevant comments wherever applicable for your consideration when you address this issue today. Here is the link to that results doc for your reference:
IMPORTANT TO NOTE
In the initial test diagram only black hole high mass predominates but the purpose of this plot is to address some distance separation between different compact binary mergers such as black holes, neutron stars and pbhs etc. to show distinction like shown in the sample diagram I had generated. That is one of the biggest validations of our novel approach to show some distinction via distance separation between classes and that is the POV the scientific community is looking forward to validate from us.
https://docs.google.com/document/d/13DjS3Y44cE9hJq6H403zvSlkI1o9uMNBhBhK9jaRGQ8/edit?usp=sharing
This google doc is solely focused on the updated results and descriptions for ligo_secondary_analysis.ipynb. I am putting all the issues in sequence and requesting brief descriptions of techniques and methodologies involved with relevant references to cite (if any) based on your work, for me to then elaborate further on those points in details and quickly wrap up the final draft of the paper for peer review.
@animikhroy this comment doesn't really provide feedback here. Could you please provide more direct feedback based on the samples above?
There is an embedding section ( called Embedding ) in : https://github.com/animikhroy/rk_toolkit_pipeline_diagrams/blob/main/02_notebooks/rk_gw_mma/ligo_secondary_analysis.ipynb which discusses the method and provides the code in more detail.
As of now, the feedback sounds like it's fine as is
Added event labels to the Diagram:
@andorsk there is no separation of categorical events in the initial diagrams that you provided all of them are just black holes of different mass.
What we need is to show different types of compact binaries separated by distance based on their class. i.e. there needs to be NS-NS, NS-BH and PBH-BH R-K diagrams in your plot like I had shown on my sample. All your events are just BH-BH events which is not the point of this exercise. Furthermore I do like the event label and classification label on the plots which is good, but have you noticed all of them say the exact same thing i.e. Black Hole (High Mass) and GW170809 so that is definitely a problem that needs to be addressed
Feedback noted. Reviewing.
Better spatial separation: 2 from each class:
PBH Far left. NS + BH Next to them. NS next. Then BH after.
Another version:
Note the centers are color coded by class
@animikhroy please review the above and lmk if any of these are sufficient.
More:
@andorsk This is truly impressive! I think the last iteration is perfect almost. However, I would like to use the previous steps as well to show how the study evolved. Could you just do 2 things?
1) Just give me notes on what you did in the previous steps as well which were different from the final step. The final step is dope but documenting the steps is important for the peer review paper.
2) Could you please add x and y axis labels to the plot and create 2 lines of separation on top of the final plot as shown in the diagram below? I have used your final diagram for clarity:
@andorsk the classification lines were in incorrect in my previous diagram so I have updated them now. It shows correctly with distinct categorical classifications as planned so I want the final diagram to look like that one along with axis labels. So for the axis labels should represent similarity scores normalized between 0 and 1 for both both the axis right? That would also be correct according to our definition of R-K Distance and the following sections.
- @animikhroy could you clarify by "previous steps"?
For this I just mean the 2 previous steps you attempted. They generated 2 plots that were successively closer to the final target plot you generated. So what did these plots lack in the first 2 steps is what I need for documentation in our paper.
@animikhroy i just made some parameter updates. and no @animikhroy that's not how the axis works in an embedding. An embeddings is different than a similarity score.
@andorsk I understand the difference in principle but having the 2 axis labels normalized with a range of values from 0-1 on both x & y is best I could think of. Do you have any other suggestion for t-sne axis labels because GPS Times or Frequency or SNR would make no sense here.
@andorsk Ok here is my alternate suggestion : To label the axes, I recommend writing something like "t-SNE dimension 1" and "t-SNE dimension 2" for x & y respectively.
Yea. That would be fine.
Yea. That would be fine.
@andorsk and for the range of the axes the better suggestion is to follow this wiki diagram which goes from -10 to 0 to +10 on both x & y. I think that would be better than normalizing between 0 and 1 since these are pair-wise embeddings so a flat 0-1 would make less sense!
Here is the Link FYI: https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding#/media/File:T-SNE_visualisation_of_word_embeddings_generated_using_19th_century_literature.png
@animikhroy please confirm. Then I'd like to close this ticket as complete
The lines should be done post process. unless you want to give me the exact slope and origin you used.
@andorsk yes this looks great apart from the lines of separation. Wait let me check regarding the slope and get back asap!
I made a pass through trial and eerror
@animikhroy
Let me know if I can close this ticket now
label fix. above
@andorsk okay cool this is perfect I was just about to send you coordinated angle of inclination and slop but you did everything perfectly! So before we close this issue I just wanted to confirm if you were treating the PCA SVM diagram a part of this issue or that was for the ROC AUC issue?
https://docs.google.com/document/d/13DjS3Y44cE9hJq6H403zvSlkI1o9uMNBhBhK9jaRGQ8/edit
Just refer to point 3) on this google doc link and confirm once! Everything else is perfect for this issue!
This issue was specifically addressing request for 3) R-K diagram based classification diagram of compact binaries with t-sne or any other technique as shown below. Addressing ROC/AUC is a different issue and is designated as such.
@andorsk thanks for the confirmation
I believe you had attempted this initially as well but discussed about removing the redundant node clusters apart from mass, spin, q-ratio and redshift and redoing it with better spatial separation and distinction as shown in the diagram below. Everyone really appreciated this idea at Glasgow and gave their positive feedback and support so this will be the biggest validation of our application from the LIGO and compact binary POV for the physics community w.r.t the last section of our paper.