OpenKnowledgeMaps / open-discovery

Open roadmap for developing a visual interface to the world's scientific knowledge
55 stars 5 forks source link

Call for Collaborators #1

Closed pkraker closed 8 years ago

pkraker commented 8 years ago

I am currently preparing a proposal for the Open Science Prize in the field of open discovery, and I am looking for motivated collaborators who want to join the project and change the way we do discovery. You can find a summary of the idea here. A first draft is also available.

I am looking for backend and frontend web developers who code in JavaScript and/or PHP and R. We will be extending an existing tool for creating web-based knowledge domain visualizations that uses D3.js on the frontend, and R content mining packages on the backend, in particular rOpenSci and tm, so you should have experience with at least one of these libraries. A background in biomed would be nice but it's not mandatory.

Everything about this project will be open: we will prepare the proposal in the open, the development will take place on a public Github repository, and all project outputs will be published under an open license.

So if you want to join the project and create an awesome open science tool together with me, please send an e-mail to opendiscovery@gmx.at outlining which part of the project interests you most, what you'd be able to contribute and how many hours you could devote to the project over the coming months. Please also include a link to your Github repository. It would be great if you could let me know whether you are a citizen of, or permanent resident in, the United States (US), as we will need to have at least one team member who satisfies this criterion. I am looking forward to your messages!

mattodd commented 8 years ago

Interesting proposal. I've always wanted a way to surf citation maps. Start with one paper (DOI) and have a network presented on-screen that shows all the papers that cite your first paper (and so on and so on) so that you can be sure you can quickly check all the relevant lit interactively. Would your system have links between papers that cite each other, or be able to generate such links graphically? I understand clustering into themes, but what about citation links?

pkraker commented 8 years ago

Matt, thanks for your feedback and your suggestion! I like the idea of "citation surfing". We are indeed planning to show links between items as an overlay; we'd even like to have the ability to show different types of links, including citations, but also things like shared facts - e.g. papers that relate to a common entity (such as a species). In this way, we hope to give multidimensional views on the same set of items. What do you think about that - would that be useful in your discovery process?

mattodd commented 8 years ago

Shared facts is obviously complex and potentially powerful. We've a proposal going in on this thing called SCINDR that will identify shared chemical and biological facts, but initially using things like molecular strings or protein codes etc, where one can perhaps more easily quantify similarity. What you're talking about is I think something based on natural language which is harder. But the citation surfing thing, where you're using concrete yes/no citation links, is still useful. If I'm looking at a paper I want to be able to navigate the "Six Degrees of Kevin Bacon" for papers and find all the relevant lit in an intuitive map, where I can tag papers as not relevant, or I can make notes on important papers and have those objects appear coloured or larger. It'd cut the time needed to write a review in half, and indeed one would end up with a map that would nicely complement the review. So a smaller problem, but one with huge value if solved.

mekarpeles commented 8 years ago

@mattodd -- yeah, this is all spot on, 100%.

pkraker commented 8 years ago

@mattodd For facts, we will rely on standardized vocabulary and existing APIs - we don't want to duplicate the work that is currently conducted by projects such as OpenAIRE, ContentMine, OpenMinted etc. A preliminary list of APIs that we want to connect to can be found in the second draft of the proposal at the end of the section on Data Sources.

In my view, writing a review is a great use case for Blaze. Imagine you want to write a review of educational technology (a bit broad, but let's stick to this example) and you'd start off with something like this. Not only does it give you an overview of the field, showing you the various sub-areas and relevant resources; it also provides you with other reviews that have already been written (see the area "Meta Analysis"). Now imagine that the map includes the extensions that we propose: not only will you be able to see various relationships between papers and be able to make notes, exclude papers, increase/decrease their sizes, you will also be able to extend the map with other papers that you may have already stored in e.g. your Zotero library. You will also be able to include maps of sub-topics that you have searched for. Other people can see your map and include resources that you might have overlooked. And once the review is published, the map will make for an interesting supplementary material, letting your readers interactively explore the topic at hand.

BTW, the maps in BLAZE can easily be created using citation information. After all, the only thing that you need to create a map are resources and a measure of similarity between these resources. I'd tend to use either bibliographic coupling or co-citation as they outperform direct citations when mapping the research front. The main issue with citations is that at least to my knowledge, there is not much open citation information out there. Citations have traditionally been extracted by proprietary data providers (Scopus, Web of Science), and they wouldn't allow you to reuse their data in a tool like BLAZE. Another major issue of citations: they are mostly blind to resources other than papers...

mattodd commented 8 years ago

That's interesting, thanks.

On 22 February 2016 at 07:24, Peter Kraker notifications@github.com wrote:

@mattodd https://github.com/mattodd For facts, we will rely on standardized vocabulary and existing APIs - we don't want to duplicate the work that is currently conducted by projects such as OpenAIRE, ContentMine, OpenMinted etc. A preliminary list of APIs that we want to connect to can be found in the second draft of the proposal at the end of the section on Data Sources https://github.com/pkraker/open-discovery/blob/master/proposal.md#data-sources .

In my view, writing a review is a great use case for Blaze. Imagine you want to write a review of educational technology (a bit broad, but let's stick to this example) and you'd start off with something like this http://openknowledgemaps.org/. Not only does it give you an overview of the field, showing you the various sub-areas and relevant resources; it also provides you with other reviews that have already been written (see the area "Meta Analysis"). Now imagine that the map includes the extensions that we propose: not only will you be able to see various relationships between papers and be able to make notes, exclude papers, increase/decrease their sizes, you will also be able to extend the map with other papers that you may have already stored in e.g. your Zotero library. You will also be able to include maps of sub-topics that you have searched for. Other people can see your map and include resources that you might have overlooked. And once the review is publish ed, the map will make for an interesting supplementary material, letting your readers interactively explore the topic at hand.

BTW, the maps in BLAZE can easily be created using citation information. After all, the only thing that you need to create a map are resources and a measure of similarity between these resources. I'd tend to use either bibliographic coupling or co-citation as they outperform direct citations when mapping the research front http://eprints.hums.ac.ir/3155/1/Co-Citation%20Analysis%20%20Bibliographic%20Coupling%20and%20Direct%20citation%20which%20citation%20approach%20represents%20the%20research%20front%20most%20accurately.pdf. The main issue with citations is that at least to my knowledge, there is not much open citation information out there. Citations have traditionally been extracted by proprietary data providers (Scopus, Web of Science), and they wouldn't allow you to reuse their data in a tool like BLAZE.

— Reply to this email directly or view it on GitHub https://github.com/pkraker/open-discovery/issues/1#issuecomment-186906656 .

MATTHEW TODD | Associate Professor School of Chemistry | Faculty of Science

THE UNIVERSITY OF SYDNEY Rm 519, F11 | The University of Sydney | NSW | 2006 T +61 2 9351 2180 | F +61 2 9351 3329 | M +61 415 274104 E matthew.todd@sydney.edu.au | W http://sydney.edu.au/science/people/matthew.todd.php W http://opensourcemalaria.org/ | W http://opensourcetb.org/ | W http://opensourcepharma.net/

CRICOS 00026A This email plus any attachments to it are confidential. Any unauthorised use is strictly prohibited. If you receive this email in error, please delete it and any attachments.

pkraker commented 8 years ago

The call for collaborators is closed. We can take on no more team members.