BeagleLab / voyage

Planning for the Beagle Project
4 stars 1 forks source link

Ontology of departments and advisor/student links #42

Closed RichardLitt closed 9 years ago

RichardLitt commented 9 years ago

Idea: We really need a scholarly graph or ontology of people. Can we scrape university websites to get this?

An example would be Erdős numbers, which map out connections people have to publications by Erdős through publications with collaborators.

This is an open problem in science, which has not been solved, but would be incredibly useful if we could capture it. One way of doing this would be to mine dissertations and find advisors.

jameswweis commented 9 years ago

​Scraping lab websites could be tricky -- formats differ widely. Mining dissertations is one approach -- another is inferring the relationships from journal citation data. This may lead to more of an 'interaction community,' but that may be more useful to us anyway. ​

On Fri, Dec 12, 2014 at 2:06 PM, Richard Littauer notifications@github.com wrote:

Idea: We really need a scholarly graph or ontology of people. Can we scrape university websites to get this?

An example would be Erdős numbers, which map out connections people have to publications by Erdős through publications with collaborators.

This is an open problem in science, which has not been solved, but would be incredibly useful if we could capture it. One way of doing this would be to mine dissertations and find advisors.

— Reply to this email directly or view it on GitHub https://github.com/BeagleLab/voyage/issues/42.

RichardLitt commented 9 years ago

Yeah, scraping websites isn't necessarily feasible at scale. Inferring relationships would work better if we could get a get training set though, which may be able to be got through mining.

adammarblestone-zz commented 9 years ago

There are a few relevant projects among our "class" of MetaKnowledge grants:

http://www.knowledgelab.org/news/detail/1.4_million_in_grants_awarded_to_metaknowledge_projects

Stephen David: "Neurotree: Graphing the Evolution of Science Through Mentorship Networks". Oregon Health and Science University

Neuroscience professor Stephen David runs Neurotree, an open-access website that has tracked mentor relationships for over 40,000 neuroscientists over the last eight years. David plans to use his grant to develop tools to curate Neurotree’s database and link it to publication databases. These links will help “explore how mentorship influences the emergence and evolution of ideas, and if this information can help trainees choose mentors,” writes David. The grant will also support the development of the growing Academic Family Tree, which does work similar to Neurotree for other disciplines like music composition and theology. Because of the magnitude and difficulty of problems like name disambiguation in this area, my feeling within Beagle would be that we should catalyze others to solve this problem, while we focus at least initially on the core user interface issues for scientific annotation and sharing... for example, at the MetaKnowledge workshop I met someone, in addition to Stephen David, who was working on this scrape-the-web-for-scientists'-identities-and-relationships problem...

But I agree that at the "platform" level of scientific sharing tools, something like this would be great to have down the line.

On Sat, Dec 13, 2014 at 5:11 PM, Richard Littauer notifications@github.com wrote:

Yeah, scraping websites isn't necessarily feasible at scale. Inferring relationships would work better if we could get a get training set though, which may be able to be got through mining.

— Reply to this email directly or view it on GitHub https://github.com/BeagleLab/voyage/issues/42#issuecomment-66893587.

jameswweis commented 9 years ago

Interesting. Also potentially useful is the mathematics genealogy project: http://genealogy.math.ndsu.nodak.edu/

Sent via mobile

On Dec 13, 2014, at 17:31, Adam Marblestone notifications@github.com wrote:

There are a few relevant projects among our "class" of MetaKnowledge grants:

http://www.knowledgelab.org/news/detail/1.4_million_in_grants_awarded_to_metaknowledge_projects

Stephen David: "Neurotree: Graphing the Evolution of Science Through Mentorship Networks". Oregon Health and Science University

Neuroscience professor Stephen David runs Neurotree, an open-access website that has tracked mentor relationships for over 40,000 neuroscientists over the last eight years. David plans to use his grant to develop tools to curate Neurotree’s database and link it to publication databases. These links will help “explore how mentorship influences the emergence and evolution of ideas, and if this information can help trainees choose mentors,” writes David. The grant will also support the development of the growing Academic Family Tree, which does work similar to Neurotree for other disciplines like music composition and theology. Because of the magnitude and difficulty of problems like name disambiguation in this area, my feeling within Beagle would be that we should catalyze others to solve this problem, while we focus at least initially on the core user interface issues for scientific annotation and sharing... for example, at the MetaKnowledge workshop I met someone, in addition to Stephen David, who was working on this scrape-the-web-for-scientists'-identities-and-relationships problem...

But I agree that at the "platform" level of scientific sharing tools, something like this would be great to have down the line.

On Sat, Dec 13, 2014 at 5:11 PM, Richard Littauer notifications@github.com wrote:

Yeah, scraping websites isn't necessarily feasible at scale. Inferring relationships would work better if we could get a get training set though, which may be able to be got through mining.

— Reply to this email directly or view it on GitHub https://github.com/BeagleLab/voyage/issues/42#issuecomment-66893587.

— Reply to this email directly or view it on GitHub.

RichardLitt commented 9 years ago

I agree about helping to catalyze. Alright! We should contact Stephen David to see if Neurotree has an API. Same for the Mathematics Genealogy project.

Closing this for now.