Open Kully opened 10 months ago
Hi Adam,
Are you suggesting a way to find the shortest citation path between paperA and paperB?
Citations of papers are available, but currently one can only see direct citations (immediate neighbors on graph).
On Jan 29, 2024, at 3:16 PM, Adam Kulidjian @.***> wrote:
It would be super cool to be able to see a map of citations between the papers themselves?
I was not able to see support for this in citegraph and wanted to know if this is something that:
Here is a quick mock up I put together in Figma that hopefully helps illustrate the intention
image.png (view on web)https://github.com/Citegraph/citegraph/assets/10369095/254f2172-83e5-42e9-8d5c-f92347b0706c
— Reply to this email directly, view it on GitHubhttps://github.com/Citegraph/citegraph/issues/2, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGENUWTG6VD42HJSHJMMIYTYQ45CFAVCNFSM6AAAAABCO3LH22VHI2DSMVQWIX3LMV43ASLTON2WKOZSGEYDINZTGI2DSNA. You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hi Boxuan,
Nice hearing back from you. :)
Not exactly. What I’m after is the ability to see more than just the direct citation from one paper to another.
The motivation is, if you are reading or have read a paper that you found valauble, to quickly be able to see the papers it cites. With a cohesive picture if all nearby connections, it would be impactful to visually see what papers act as more a bedrock for other papers.
I hope this makes sense 🙏 Curious to hear back.
On Mon, Jan 29, 2024 at 1:31 AM Boxuan Li @.***> wrote:
Hi Adam,
Are you suggesting a way to find the shortest citation path between paperA and paperB?
Citations of papers are available, but currently one can only see direct citations (immediate neighbors on graph).
On Jan 29, 2024, at 3:16 PM, Adam Kulidjian @.***> wrote:
It would be super cool to be able to see a map of citations between the papers themselves?
I was not able to see support for this in citegraph and wanted to know if this is something that:
- is possible?
- could be created via the API?
Here is a quick mock up I put together in Figma that hopefully helps illustrate the intention
image.png (view on web)< https://github.com/Citegraph/citegraph/assets/10369095/254f2172-83e5-42e9-8d5c-f92347b0706c>
— Reply to this email directly, view it on GitHub< https://github.com/Citegraph/citegraph/issues/2>, or unsubscribe< https://github.com/notifications/unsubscribe-auth/AGENUWTG6VD42HJSHJMMIYTYQ45CFAVCNFSM6AAAAABCO3LH22VHI2DSMVQWIX3LMV43ASLTON2WKOZSGEYDINZTGI2DSNA>.
You are receiving this because you are subscribed to this thread.Message ID: @.***>
— Reply to this email directly, view it on GitHub https://github.com/Citegraph/citegraph/issues/2#issuecomment-1914047367, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACPDQRY7WMYEY4VR2JL3HKTYQ4653AVCNFSM6AAAAABCO3LH22VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJUGA2DOMZWG4 . You are receiving this because you authored the thread.Message ID: @.***>
With a cohesive picture if all nearby connections, it would be impactful to visually see what papers act as more a bedrock for other papers.
I am trying to formulate what you describe here, and please correct me if I am wrong. Given a paper A, it would be helpful to see all papers it cites (A1, A2, A3, ... An), and also the papers those papers cite (A11, A12, A13, ... A1i, A21, A22, A23, ... A2j, ..., An1, An2, ...) - so called two-hop references. It could also be three-hop if required, but I presume two-hop would be enough. The visualization should also include links (citation relationships) among those papers. It should be visually straightforward to find the paper(s) that have most ingoing links (citations) in this network.
Let's say paper A cites B, C, and D. B cites D, E, and F, while C cites D and G. D cites Z. We should draw a graph that shows A, B, C, D, E, F, G, Z, with D being highlighted because D is the bedrocks for these papers. Likely B and C are relational works or prior works of A, while D is the foundational work in this field.
Hi @li-boxuan apologies for the late response. I've been busy with work and just getting a chance to read now. 😄
Yes this is pretty much what I am imagining, esp. with your last paragraph's framing: it is important to have a sense of bedrock for papers, so as to allow us (the thinker, researcher, etc) to help us proportion our confidence or significance of a paper with more ingoing links (citations) than fewer.
I quickly designed picture to illustrate your example:
@li-boxuan What do you think?
Misc Thoughts:
@Kully
Sounds good! I'll add this to my TODO list. Right now, I am re-processing all the data, enriching the data (adding paper descriptions, venue, tags, etc.) and cleaning up some cumbersome duplicate data points. After that, I'll work on this. Please expect a prototype by the end of this month or maybe next month.
The major challenge I see is the limited computation power. In the real network, a 3-hop query could reach tens of thousands of nodes and poses great pressure on my server if we don't limit the fan-out factor. That being said, it might make sense to allow N-hop query with a fan-out factor, assuming that dropping out several random citations would not harm the ability to find the "bedrock".
Computation power wouldn't be a bottleneck if the user runs the application on their own machine or even laptop. I could probably upload the cleaned dataset and package a Docker image for the application. It wouldn't be an one-click experience, though. Is that something you would be interested in? I imagine most users don't want bother doing that but I am just curious.
Finally, I want to say that I've been a bit demotivated recently due to lack of attention. Feedback like this is of great help to me. If you like this project, a star would also be very appreciated :)
I feel you on lack of motivation. Curious interested minds absolutely help move us along in our sode projects. (Im making a video game with a friend and it would have been a completely different wod doing solo)
But going back to the project, a prototype of this would be wonderful. Do you have longer term visions with respect to the app? Other goal?
Would it make sense for me to help you build/design the prototype?
Re the dropping out random citations, I was imagining a similar thing. Maybe loading in minified versions of the dataset when zoomed out would make sense, only to load in more nodes on the fly once you start honing in some part of network more closely.
On Sat, Feb 3, 2024 at 10:02 PM Boxuan Li @.***> wrote:
@Kully https://github.com/Kully
Sounds good! I'll add this to my TODO list. Right now, I am re-processing all the data, enriching the data (adding paper descriptions, venue, tags, etc.) and cleaning up some cumbersome duplicate data points. After that, I'll work on this. Please expect a prototype by the end of this month or maybe next month.
The major challenge I see is the limited computation power. In the real network, a 3-hop query could reach tens of thousands of nodes and poses great pressure on my server if we don't limit the fan-out factor. That being said, it might make sense to allow N-hop query with a fan-out factor, assuming that dropping out several random citations would not harm the ability to find the "bedrock".
Computation power wouldn't be a bottleneck if the user runs the application on their own machine or even laptop. I could probably upload the cleaned dataset and package a Docker image for the application. It wouldn't be an one-click experience, though. Is that something you would be interested in? I imagine most users don't want bother doing that but I am just curious.
Finally, I want to say that I've been a bit demotivated recently due to lack of attention. Feedback like this is of great help to me. If you like this project, a star would also be very appreciated :)
— Reply to this email directly, view it on GitHub https://github.com/Citegraph/citegraph/issues/2#issuecomment-1925561390, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACPDQR22F4AX5DT4WZGZ3MLYR3247AVCNFSM6AAAAABCO3LH22VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRVGU3DCMZZGA . You are receiving this because you were mentioned.Message ID: @.***>
I don't have specific longer term visions with respect to this app. I have some Azure credits to spend every month, so I started this project for fun (with the hope that it could help some people). That being said, I do want to enhance the UI experience in the long run, but that would depend on the feedback and popularity of the app. Worst case, I'll just leave the app as it is once I finish whatever I am doing right now.
I may be able to use some of your help once I deliver the first (ugly) version. Stay tuned :)
I may be able to use some of your help once I deliver the first (ugly) version. Stay tuned :)
Exciting! Staying tuned. 🙏 I would be curious to see what you come up and happy to help if it makes sense.
I can also share some design portfolio work if interested. :)
Also, gave you a star. ⭐️ (didn't know why I had not already done so)
@li-boxuan
This is a node graph project I've been working on for a while with another collaborator. Feels relevant to share.
If you wanna play with it, you can go here: https://tok.shuttleapp.rs/
@Kully Do you have any example in your mind? Say, a paper (that exists in the citegraph database), and a potential bedrock paper that you want the tool to be able to find?
I have developed a prototype but the visualization just looks too messy. Even if we only consider two-hop queries, there could be hundreds of or even thousands of papers. I guess we have to include some criteria before we draw the nodes. But I am really not sure about 3-hop nodes - that could be a terrible workload unless we use some smart heuristics that can do the pruning efficiently. So, some examples would really be helpful here.
@Kully Do you have any example in your mind? Say, a paper (that exists in the citegraph database), and a potential bedrock paper that you want the tool to be able to find?
Hmm, I don't have an example off the top of my head. I would need to go to citegraph and look for a relevant paper.
I have developed a prototype but the visualization just looks too messy. Even if we only consider two-hop queries, there could be hundreds of or even thousands of papers. I guess we have to include some criteria before we draw the nodes. But I am really not sure about 3-hop nodes - that could be a terrible workload unless we use some smart heuristics that can do the pruning efficiently. So, some examples would really be helpful here.
Wow amazing, thank you for doing that so quickly. ⚡️
Hmm are you able to share what it looks like in a branch? I'm wondering if it makes more sense to start from the design and work from backwards there? Would you be open to doing this?
@Kully
Here you go: https://www.citegraph.io/playground/citations
One example: https://www.citegraph.io/playground/citations?id=573695516e3b12023e47bfa1
Here the paper of interest (displayed in purple) is: How Good Are Query Optimizers, Really?
One-hop references are in blue, and two-hop references are in green. A node's size reflects its pagerank (note: pagerank is computed by citegraph. It's not an authoritative score, but it reflects the importance or popularity of a paper. See https://www.citegraph.io/faq for more). But maybe the node's size should reflect its popularity in this subgraph (how many papers cited it)? I am not sure which would be better.
@li-boxuan Finally getting a chance to respond to this.
Amazing work putting this together. 👏 Open-source work can be taken for granted, so want to acknowledge what you did.
Regarding your deliberation over what the node's size should reflect ... I like the idea of having the node size reflect # of citations in this subgraph. This feels more aligned with the goal of "let's find which paper(s) are more bedrock".
I'm sure there are other visual variables that can be used to express some of these, but would require more thinking and use exampled, as you mentioned.
One thing might be useful is to order the nodes like this: node of interest first, blue nodes lined up, then green nodes lined up.
What do you think?
(note, maybe the really big green node represents the most popular node, and maybe that's the only one that gets a size change)
@Kully
Your mockup looks really neat but unfortunately there's no such layout available for me to use. And if you think about it, the green nodes might also have edges among themselves, which would make the graph look messy.
Here are a few builtin layouts I could use. What do you think? I personally prefer ForceAtlas2 and I have deployed it that layout: https://www.citegraph.io/playground/citations?id=573695516e3b12023e47bfa1
Force
No Overlap
Force-Atlas2
Circular
Random
The best thing about Force-Atlas2 is it can show research communities in a vivid way (e.g. http://citegraph.io/playground/cluster?id=5448b9b7dabfae87b7e6d9ee)
@li-boxuan Sorry for the super late response. I got very busy with work.
Hmm, I do like the Force-Atlas2 one the best as you pointed out in the last slide. :) I'm curious what you mean by research communities as well.
Thinking about all of this again, I am realizing that it is probably unrealistic (and maybe not that helpful) to just "see all the connections N-hop (N=2) connections from one paper". As you are showing, it's pretty messy.
Thinking about this again, maybe a more clarified vision for what we want out of this is:
How does this sound to you? If you are open, maybe I can diagram some stuff for this? :)
I'm curious what you mean by research communities as well.
I used an algorithm to group people with collaborations. A research community is a group of people that have directly or indirectly collaborated before. What if a person collaborates with multiple groups of people? Well, the algorithm would decide which group (or so-called communities) they are best fit. Community detection is in general a research topic that has been popular for many years in the graph theory / data mining field.
How does this sound to you? If you are open, maybe I can diagram some stuff for this
YES that would be very helpful! I do want to point out that I don't have direct control on the layout algorithm (node distances, how nodes are spread over on the canvas), unless we write our own one - that would be very challenging, not impossible but could be time-consuming.
It would be super cool to be able to see a map of citations between the papers themselves?
I was not able to see support for this in citegraph and wanted to know if this is something that:
Here is a quick mock up I put together in Figma that hopefully helps illustrate the intention: