BIDS-collaborative / cega-trace

Pilot study tracing influence of research studies on public policy
BSD 2-Clause "Simplified" License
4 stars 3 forks source link

Evaluation #14

Open GitOnion opened 8 years ago

GitOnion commented 8 years ago

Real-world challenge

The project traces the influence of research (publications) on public policy (government decision making) by searching grey literature for digital fingerprints of CCT and capturing ‘the path of influence’: research on Impact evaluation of CCT -> World Bank Policy Report & WB development report -> Policy-oriented research -> Planning & Budget -> Guidelines & tools -> Project WB evaluation reports. This new understanding indicates that project WB evaluation reports should lead to new policy, then going through another planning & budget and guidelines & tools, producing another project WB evaluation report, finally forming a cycle.

The project can be applied to promote widespread adoption of high quality evidence – and in particular, the research generated by academics in our network. “Adoption” could come in the form of a new evidence-based government policy, the implementation of a proven program by government agencies, or the use of a rigorously tested business strategy by a company or startup. The objective is to show how and where the adoption of evidence occurs. This will give the team greater credibility with donors, decision-makers, and other stakeholders. For benefits, there may be several similar think tanks and policy research institutes that would benefit from any data or techniques developed by our team.

The team thinks that this project will help us identify those studies, researchers, and initiatives that are most relevant to the policy-maker community (which, in turn, can inform our allocation of resources).

Data / Materials

The team is focusing on information from citations of the following two major types of resources:

The data collected include the author name, article title, publication year, country and etc. The team combined both automatic and manual approaches to extract these data from documents and stored them in Google spreadsheet.

For academic paper, the team manually downloaded academic papers (mainly in pdf format) and recorded information, such as author name, article title, publication year, target country/cash transfer program, keywords and etc. Although the information obtained through this method is detail, it’s relatively ineffective. They also exported citation information from academic paper websites and extracted information(in html format) by R. Through this method, the team have data for about 500 academic papers related to CCT, but the data are limited to author name, title and number of citation.

The team is currently working on documents from World Bank. To get the citation information from those documents, they are trying to automize the extraction process for documents with txt. format and might have to work manually with documents with pdf. format.

Approach

Currently the team uses data main provide by WB (World bank) database in order to draw a guideline and drive a specific procedure of trace outcome of funding, which could be not very comprehensive due to missing take consider of materials from other database. But it is a good way to start from comparatively good way to start digging from WB.

The team uses R to extract information such as author name, publish date, citations from html file and they are planning to use most likely R/python to do more text mining on txt file provided by WB.

Currently the team is only using excel table to summarize the data they've collected. They hope to use software to present visualization of the "trace".

For method and tooling, the team considers using some text mining algorithms such as CHI TextRank and TFIDF to extract key words and summarize a certain name/word occurrent times and do citation mapping.

The tools the team uses for now: R packages: XML,PDF toolkit, PDFtk python package: PDFminer, pyPDF

Project Management

The first milestone is to finalize the team and make sure everyone is clear about the purpose of the project. The current milestone is to finalize the three major databases (currently built as a google spreadsheet and will soon be modified) which contain crucial information to take part in the final mapping. The three databases are for research papers, policy reports & cited research paper, and WB annual development report & cited papers. The "final" milestone is to come up with a visualization that contains the mapping between academic research papers and policy reports. For example, the team will give a visual presentation of some most-cited, or highly-cited academic research papers, as well as the frequency of them being cited.

The team is now close to drawing some brief conclusion in terms of progress, though it might not be a final conclusion. As they are closer to making such brief conclusion/observation, they are closer to reaching the final conclusion. As of now, members are pretty much split evenly to finishing up the three databases. And members who know relatively more about development studies and social issues are important in providing guidance on how to carry out the rest of the project.

zcyang123 commented 8 years ago

Data/ Materials

We are focusing on information from citations of the following two major types of resources:

Academic paper related to conditional cash transfer(CCT). World Bank policy report and World Bank development report.

The data we collect include the author name, article title, publication year, country and etc. We combined both automatic and manual approaches to extract these data from documents and stored them in Google spreadsheet.

For academic paper, we manually downloaded academic papers (mainly in pdf format) and recorded information, such as author name, article title, publication year, target country/cash transfer program, keywords and etc. Although the information obtained through this method is detail, it’s relatively ineffective. We also exported citation information from academic paper websites and extracted information(in html format) by R. Through this method, we have data for about 500 academic papers related to CCT, but the data are limited to author name, title and number of citation.

We are currently working on documents from World Bank. To get the citation information from those documents, we are trying to automize the extraction process for documents with txt. format and might have to work manually with documents with pdf. format.

manjiangjie commented 8 years ago

Real-world Challenge

We trace the influence of research (publications) on public policy (government decision making) by searching grey literature for digital fingerprints of CCT and capturing ‘the path of influence’: research on Impact evaluation of CCT -> World Bank Policy Report & WB development report -> Policy-oriented research -> Planning & Budget -> Guidelines & tools -> Project WB evaluation reports. This new understanding indicates that project WB evaluation reports should lead to new policy, then going through another planning & budget and guidelines & tools, producing another project WB evaluation report, finally forming a cycle.

Our project can be applied to promote widespread adoption of high quality evidence – and in particular, the research generated by academics in our network. “Adoption” could come in the form of a new evidence-based government policy, the implementation of a proven program by government agencies, or the use of a rigorously tested business strategy by a company or startup. Our objective is to show how and where the adoption of evidence occurs. This will give us greater credibility with donors, decision-makers, and other stakeholders. For benefits, there may be several similar think tanks and policy research institutes that would benefit from any data or techniques developed by our team.

We think that this project will help us identify those studies, researchers, and initiatives that are most relevant to the policy-maker community (which, in turn, can inform our allocation of resources).

nwingin commented 8 years ago

Project Management

Our first milestone is to finalize the team and make sure everyone is clear about the purpose of our project. Our current milestone is to finalize the three major databases (currently built as a google spreadsheet and will soon be modified) which contain crucial information to take part in our final mapping. The three databases are for research papers, policy reports & cited research paper, and WB annual development report & cited papers. Our "final" milestone is to come up with a visualization that contains the mapping between academic research papers and policy reports. For example, we will give a visual presentation of some most-cited, or highly-cited academic research papers, as well as the frequency of them being cited.

We are close to drawing some brief conclusion in terms of progress, though it might not be a final conclusion. As we are closer to making such brief conclusion/observation, we will be closer to reaching our final conclusion. As of now, members are pretty much split evenly to finishing up the three databases. And members who know relatively more about development studies and social issues are important in providing guidance on how to carry out the rest of the project.

YangZhou0417 commented 8 years ago

Approach

What are the risks? Currently we are using data main provide by WB (World bank) database in order to draw a guideline and drive a specific procedure of trace outcome of funding, which could be not very comprehensive due to missing take consider of materials from other database. But it is a good way to start from comparatively good way to start digging from WB.

What technical approaches have been explored, and which remain to be explored? We have already used R to extract information such as author name, publish date, citations from html file and we are planning to use most likely R/python to do more text mining on txt file provided by WB.

Where does the prototype "run?" We are currently only using excel table to summarize the data we've collected. And hopefully we will use software to present visualization of the "trace".

What methods (e.g., statistics, signal processing, transformation) and tooling (e.g., python libraries, hardware platforms) are being used/evaluated/considered? We are consider using some text mining algorithms such as CHI TextRank and TFIDF to extract key words and summarize a certain name/word occurrent times and do citation mapping.

The tools we are using for now: R packages: XML,PDF toolkit, PDFtk python package: PDFminer, pyPDF

davclark commented 8 years ago

I was asked to chime in on tool use... I think it's good to evaluate lots of different tools, but ultimately probably just choose one language / runtime or the other!

In particular, I think the python tools are more robust. However, if folks simply don't know python, that'd be a reason to go with R. (Or, this could be an opportunity to learn some Python!)

nhejab commented 8 years ago

Real-world challenge

This project aims at quantifying the impact of CEGA's research on policy making. The challenge is that tracking citations of published research does not provide an adequate measure of the influence, and thus, one has to search the gray literature to identify the impact of the research on policy making. The gain from this project is to determine the returns on investment in international development research which could ultimately be used as an proof of concept to get funding for social science research. Everyone would benefit from tracing the influence of funding on solving social issues, being able to wisely allocate their resources, the scientists would be more efficient at tacking problems. The developing communities would benefit and the governments and organizations that fund the research could spend the same amount of funding more wisely.

Data / Materials

CEGA team memebers have focused on the World Bank policy repository, research published on conditional cash transfer(CCT) evaluation, number of citation on policy reports, academic research data basis and Aid data. The team used both txt files on Open Knowledge and also they used meta data to crawl the xml feed for getting the right publication

Approach

CEGA team mostly uses R for text mining and some python libraries and MySQL. They seem to need to use some tools for statistical analysis, R and excel could be used to achieve this goal.

Project Management

CEGA team had set goals for themselves by the week. They hoped to mine data from the World Bank, analyse it and present their findings regarding the "Path of Influence". They likely have faced the same initial challenge of getting everyone on the same page and up to speed on the goal of the project. The team is currently working on building/finalizing three databses for research papers, policy reports and cited research paper, and the world bank's annual development reports. The ultimate goal for the team is to build a visual tool to map the influence of the research on policy making.

davclark commented 8 years ago

You guys are so lucky to get two independent evaluations! ;)