Mine PMC for ethics statements

Daniel-Mietchen commented 7 years ago

possible search terms:

ethical
"institutional review board"
"informed consent" etc.

Daniel-Mietchen commented 7 years ago

The main purpose here would be to see

what percentage of articles have a dedicated ethics section, and how that changes over time
what kind of information is provided in addition to statements of the "... received ethical approval" and "gave informed consent" kinds.
to what extent PIDs are being used in there and for what, and how that changes over time.

Daniel-Mietchen commented 7 years ago

A simple query for "approval number" currently yields 11404 hits: https://www.ncbi.nlm.nih.gov/pmc/?term=%22approval+number%22

Daniel-Mietchen commented 5 years ago

Just to clarify that conflict of interest statements are within scope here as well.

Daniel-Mietchen commented 3 years ago

I just reran that "approval number" query from Oct 12, 2017, and it now yields 37963 results, i.e. an about 3.5-fold increase in about 3.5 years.

In the meantime, I have begun to collaborate with @petermr, and we are trying to use his ContentMine pipeline (which is currently being ported to Python) to extract ethics statements from PMC. On the way, we have built a first — still very rough — dictionary (i.e. a set of words highly indicative of the topic of ethics statements), and we are trying to also get a list of ethics committees mentioned in PMC-indexed papers.

Daniel-Mietchen commented 3 years ago

Meeting on April 29, 2021:

We are considering to submit something to Wikidata Workshop
We are also considering to submit a Research Idea to RIO and a research paper as well, perhaps in WikiJournal
There is an event being planned for a weekend in May that is about introducing people to Wikidata in a playful manner. Peter will think about aligning it with the Wikimedia Hackathon
We also looked a bit into ContentMine dictionaries.

Daniel-Mietchen commented 3 years ago

Some more notes on this by @ShweataNHegde sit at https://github.com/petermr/dictionary/wiki/Ethics-Statement-Project .

Daniel-Mietchen commented 3 years ago

A search for "approval number" now gives 38437 results, i.e. about 500 more than just two weeks ago.

Daniel-Mietchen commented 3 years ago

There are ambiguities at multiple levels.

For instance, this article states that

This study was approved by the Johns Hopkins School of Medicine IRB, Approval Number: IRB00151734.

The problem here is that Johns Hopkins School of Medicine runs multiple IRBs, and there does not seem to be a straightforward mechanisms to resolve the approval number to get more metadata about the process.

Daniel-Mietchen commented 3 years ago

There is a Office for Human Research Protections (OHRP) Database for Registered IORGs & IRBs, Approved FWAs, and Documents Received in Last 60 Days that has identifiers for IRBs, but these do not resolve either.

petermr commented 3 years ago

I have started to test the phrase extraction tool NLTK-RAKE. https://towardsdatascience.com/extracting-keyphrases-from-text-rake-and-gensim-in-python-eefd0fad582f As with all language tools it will take a day or two to see how useful it is.

On Mon, May 10, 2021 at 4:03 PM Daniel Mietchen @.***> wrote:

There is a Office for Human Research Protections (OHRP) Database for Registered IORGs & IRBs, Approved FWAs, and Documents Received in Last 60 Days https://ohrp.cit.nih.gov/search/irbsearch.aspx that has identifiers for IRBs, but these do not resolve either.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Daniel-Mietchen/ideas/issues/499#issuecomment-836810063, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCSYDBWTBXH7DUKKFQMLTM7YUNANCNFSM4D5M32KA .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

ShweataNHegde commented 3 years ago

https://colab.research.google.com/drive/1sFj07mE2XRyeaplvsTs34-VaDHBjnt6U?usp=sharing

Ayush (openVirus volunteers) and I wrote a piece of code that can extract common phrases from a text file with manually scraped Ethics Statements.

Daniel-Mietchen commented 3 years ago

Some updates from this week:

@ShweataNHegde has created a more refined ethics dictionary here, as per these notes
I have created Wikidata lexemes for most of the entries in her dictionary, as per this overview
I also started WikiProject Ethics.

Daniel-Mietchen commented 3 years ago

For more recent updates, see the notes over at Shweata's page.

Daniel-Mietchen commented 3 years ago

Here is a list of ethics-related entities Shweata has mined from articles on stem cells.

Daniel-Mietchen commented 3 years ago

Some more observations by Shweata and Peter sit here.

We now have a dedicated organization, repo and wiki:

Daniel-Mietchen commented 3 years ago

The paper How does nursing research differ internationally? A bibliometric analysis of six countries. has a Table 1 that looks at certain features of previous studies, including

Extracted specific properties (e.g., contains ethics statements)

Daniel-Mietchen commented 1 year ago

The project with Shweata and Peter (and Ayush) has since led to a publication:

Hegde SN, Garg A, Murray-Rust P, Mietchen D (2022) Mining the literature for ethics statements: A step towards standardizing research ethics. Research Ideas and Outcomes 8: e94685. https://doi.org/10.3897/rio.8.e94685 .

It outlines a workflow for mining ethics statements and discusses motivations, applications and complications.

Daniel-Mietchen / ideas

Mine PMC for ethics statements #499