Closed Daniel-Mietchen closed 6 years ago
The approved sessions are now listed at https://www.scidatacon.org/conference/IDW2018/approved_sessions/ .
Submission deadline has been changed to May 31.
Session I am involved in:
Other sessions to consider:
Submitted to the session "Democratising Data Publishing: A Global Perspective"
Title Wikidata and Wikibase as global platforms for democratizing data publishing
Abstract Wikidata (https://www.wikidata.org) is a multilingual collaborative platform that democratizes data publishing similar to how Wikipedia democratized the publishing of encyclopedia information. It is tightly integrated with all language versions of Wikipedia and its sister sites, and it collects, reuses and provides structured public domain data across all areas of knowledge from all around the world. Wikidata meets the requirements of the FAIR principles to make data findable, accessible, interoperable and reusable, and it allows people to collaborate who do not share a common language. With about 18,000 volunteer contributors each month that collaborate openly, Wikidata blends open science and citizen science approaches.
The human contributors are aided by hundreds of automated or semi-automated tools that perform repetitive tasks at scale, based on community-agreed standards. Together, they have aggregated over 5 billion RDF triples on the platform that can be queried via a dedicated SPARQL endpoint and other means, which aids in quality control of the database content and workflows, and facilitates knowledge discovery within the corpus.
Thanks to a combination of extensive examples, help pages, tutorials, user interface design and other mechanisms, this endpoint is gentle to users across various skill levels for the SPARQL query language. This way, Wikidata also democratizes access to and participation in the Semantic Web.
The software underlying Wikidata is Wikibase (http://wikiba.se/). It is open source and openly licensed, which allows anyone to run semantic databases that are interoperable with Wikidata and other Wikibase instances. By default, Wikibase instances come with a SPARQL endpoint of their own that is modelled after the Wikidata one.
I also just submitted a variant of this to the session on "Citizen Science Data – from Collection to Curation to Management".
A wiki approach to collecting, curating and managing citizen science data
Wikidata (https://www.wikidata.org) is a multilingual collaborative platform that democratizes data curation similar to how Wikipedia democratized the curation of encyclopedic information. It is tightly integrated with all language versions of Wikipedia and its sister sites, and it collects, reuses and provides structured public domain data across all areas of knowledge from all around the world. Wikidata meets the requirements of the FAIR principles to make data findable, accessible, interoperable and reusable, and it allows people to collaborate who do not share a common language. With about 18,000 volunteer contributors each month that collaborate openly, Wikidata blends open science and citizen science approaches.
The human contributors are aided by hundreds of automated or semi-automated tools that perform repetitive tasks at scale, based on community-agreed standards. Together, they have aggregated over 5 billion RDF triples on the platform that can be queried via a dedicated SPARQL endpoint and other means, which aids in quality control of the database content and workflows, and facilitates knowledge discovery within the corpus.
Thanks to a combination of extensive examples, help pages, tutorials, user interface design and other mechanisms, this endpoint is gentle to users across various skill levels for the SPARQL query language. This way, Wikidata also democratizes access to and participation in the Semantic Web.
The software underlying Wikidata is Wikibase (http://wikiba.se/). It is open source and openly licensed, which allows anyone to run semantic databases that are interoperable with Wikidata and other Wikibase instances. By default, Wikibase instances come with a SPARQL endpoint of their own that is modelled after the Wikidata one.
Besides Wikidata and Wikibase, there are multiple layers of citizen science activities taking place in other Wikimedia projects, e.g. the identification of species, historic personalities or buildings as well as the transcriptions of documents or the location of historic maps.
Next variant, for the "Delivering a Global Open Science Commons" session:
A wiki perspective on an Open Science Commons
Wikidata (https://www.wikidata.org) is a multilingual platform built around the principles of open infrastructure, open standards, open collaboration and verifiability. It democratizes data curation similar to how Wikipedia democratized the curation of encyclopedic information. It is tightly integrated with all language versions of Wikipedia and its sister sites, and it collects, reuses and provides structured public domain data across all areas of knowledge from all around the world. Wikidata meets the requirements of the FAIR principles to make data findable, accessible, interoperable and reusable, and it allows people to collaborate who do not share a common language. With about 18,000 volunteer contributors each month that collaborate in an entirely open fashion, Wikidata blends open science and citizen science approaches.
The human contributors are aided by hundreds of automated or semi-automated tools that perform repetitive tasks at scale, based on community-agreed standards. Together, they have aggregated over 5 billion RDF triples on the platform that can be queried via a dedicated SPARQL endpoint and other means, which aids in quality control of the database content and workflows, and facilitates knowledge discovery within the corpus.
Thanks to a combination of extensive examples, help pages, tutorials, user interface design and other mechanisms, this endpoint is gentle to users across various skill levels for the SPARQL query language. This way, Wikidata also democratizes access to and participation in the Semantic Web.
The software underlying Wikidata is Wikibase (http://wikiba.se/). It is open source and openly licensed, which allows anyone to run semantic databases that are interoperable with Wikidata and other Wikibase instances. By default, Wikibase instances come with a SPARQL endpoint of their own that is modelled after the Wikidata one.
Like all Wikimedia platforms, the development and operation of Wikidta and Wikibase are driven by volunteers and the infrastructural aspects are crowdfunded through donations from millions of individual users as part of the annual Wikipedia funding campaigns.
Next one: "Open Data from Cell Biology of Infectious Pathogens, how far are we?"
Title Data sharing as a new component of addressing and preparing for disease outbreaks
Abstract Public health emergencies require profound and swift action at scale with limited resources, often on the basis of incomplete information and frequently under rapidly evolving circumstances. While emergency-triggered sharing goes back millennia, data sharing is a relatively new flavour under this broader theme, but one that has been receiving attention over the last few years, especially in the context of public health emergencies like the Ebola or Zika outbreaks.
In response, researchers, research institutions, journals, funders and others have taken steps towards increasing the sharing of data around ongoing public health emergencies and in preparation for future ones. These measures range from the adoption of open lab notebooks to modifications of policies and funding lines, and they include conversations around infrastructure and cultural change.
In this contribution, I will provide an overview of different ways in which the sharing of data has played a role in public health emergencies, highlighting steps that have already been taken over the last decade as well as challenges still lying ahead.
While focusing on disease outbreaks, I will draw on examples from other public health emergencies as well (e.g. earthquakes or tropical storms) and discuss their applicability in the context of infections. The examples will span the entire data life cycle of public health emergencies, from preventive measures and routine public health surveillance data to the tracking of pathogens, investigating pathogen transmission and other host-pathogen interactions, as well as diagnostics, vaccination, epidemiological modelling, data ethics and other related topics, concluding with considerations around he potential impact of preserving and sharing data, or failing at that.
Next: "Scientific Data Challenges for Sustainable Development"
To what extent can machine-actionable data management plans help automate disaster-related workflows?
As highlighted in the session description, disaster reduction and mediation efforts are closely linked to the Sustainable Development Goals and have a broad range of data needs. These data needs include coping with a variety of kinds of data, zero to many resources that provide such data, long-term versus real-time data, data quality issues, the ethics of sharing (or not) and how, management of data-related resources (or the lack thereof) and various degrees of uncertainties around any of this and integration across or zooming in on various aspects thereof.
In this session, I would like to explore some avenues towards a higher degree of automation to address these data needs, focusing on how the concept of a Data Management Plan or the more general notion of an output management plan that has been put forward by the Wellcome Trust can be used to inform policies, workflows and infrastructure around disaster reduction and response. In particular, I will highlight the potential of making such plans machine-actionable, versioned, FAIR and public. This would allow to aggregate, visualize and reslice the information contained in such plans and to use it to interact with disaster-related infrastructure, policies or actors, or as a basis for people, organizations or machines to learn from data gathered about ongoing or past disasters.
For any specific future disaster, many details are of course unknowable at present, but depending on the kind of disaster, certain characteristic data needs are predictable to some extent. Disease outbreaks, for instance, may require different responses based on whether the respective pathogens and their potential modes of transmission are known or not, what their zoonotic potential is, where their host species live, whether the affected animal or human populations have any degree of immunity, whether travel or migration are involved, and so forth. On the basis of initial responses to these questions, decisions can be made as to whether additional information is needed, how to gather it, how to process, aggregate and communicate what is known, how to derive recommendations (e.g. with respect to vaccine campaigns or travel recommendations, or what research to fund), and what the corresponding resource needs are in terms of humans, machines, infrastructure, finances and logistics, where and on what time scales.
Similar questions can be asked for other disaster scenarios like earthquakes, wildfires, storms or oil spills, and information related to such questions already forms the backbone of institutionalized disaster response and prevention mechanisms in many contexts. What is often missing, though, is the interoperability - on both long and short time scales - of such existing mechanisms across actors, jurisdictions, disaster types, or research disciplines. Some pockets of basic interoperability exist in various places, e.g. emergency phone numbers are harmonized within most nations and across EU member states, tsunami warnings can be broadcast nationally or regionally within seconds of an earthquake, and high-speed trains can come to a stop in response. How can we achieve similar harmonization around disaster-related data, repositories, APIs, data models, simulations and related issues, how would that affect humans and machines, and how can we track relevant progress with respect to the Sustainable Development Goals?
I also put one in under "General submissions", which is a variant of https://github.com/Daniel-Mietchen/events/blob/master/ICEI2018-research-ecosystem.md :
WikiCite and Scholia - a Linked Open Data approach to exploring the scholarly literature and related resources
Abstract
Research takes place in a sociotechnical ecosystem that connects researchers, institutions, funders, databases, locations, publications, methodologies and related concepts with the objects of study and the world around them. Schemas for describing such concepts are growing in breadth and depth, number and popularity, as are mechanisms to persistently and uniquely identify the concepts, the schemas, their relationships or any of their components. In parallel, more and more data — and particularly metadata — are being made available under open licenses, which facilitates discoverability, reproducibility and reuse, as well as data integration.
Wikidata is a community-curated open knowledge base in which concepts covered in any Wikipedia — and beyond — can be described in a structured fashion that can be mapped to RDF and queried using SPARQL as well as various other means. Its community of close to 20,000 monthly contributors oversees a corpus that currently comprises nearly 50 million 'items', i.e. entries about concepts. These items cover a broader range of topics than Wikipedia and are annotated and linked via almost 5000 'properties' that describe relationships between items or between items and external entities or that express specific values. The items and properties have persistent unique identifiers, to which labels and descriptions can be attached in about 300 natural languages. For instance, Q3919 represents the item for 'Gaborone' and Q6786626 'maternal health', while P274 stands for the property of 'chemical formula', and P225 for 'taxon name'. Besides places, health conditions, chemical compounds and taxa, Wikidata also contains information about researchers and many components of their research ecosystems, including a growing body of publications and databases, particularly in the life sciences, which can be used as references in Wikidata or beyond. The curation of this reference-centric part of Wikidata is overseen by the WikiCite initiative, which extends from scholarly publications to patents, course cases, cell lines and a range of other resources that are being cited.
A range of open-source tools is available to interact with Wikidata — to enter information, curate and query it. One of them is Scholia, a frontend to Wikidata's SPARQL endpoint. Available via https://tools.wmflabs.org/scholia/ , it can be used to explore research publications and how they relate to authors, institutions, funders and other parts of the research ecosystem, as well as to taxa, metabolic networks, or geolocations.
In this presentation, we will use Scholia as a starting point for exploring how information about scholarly research is represented in Wikidata and how it can be explored, curated and reused.
follows https://github.com/Daniel-Mietchen/events/issues/321 .
The call is to open next week. More via https://www.scidatacon.org/IDW2018/ .
See https://github.com/Daniel-Mietchen/events/blob/master/SciDataCon-2018.md for my submissions so far.