asreview / synergy-dataset

SYNERGY - Open machine learning dataset on study selection in systematic reviews
Creative Commons Zero v1.0 Universal
62 stars 27 forks source link

Contribution of SRs from EFSA #29

Closed behrica closed 3 years ago

behrica commented 4 years ago

I started to have a look at our full SR database.

Maybe I start to describe what I have here, and then we can iteratively think about what would be worth to include. Maybe we can have as well an other call.

We have in total 299 "projects" in Distiller. Quite some of them are "tests" or other garbage. Hard to say how many, but by looking at the project names, there might be at least 100 project which should not be looked at at all. so 200 projects remaining.

Each of it has at least "one level", in which a level could mean different things:

"title screening" "abstract screening" "title + abstract screening" "full text screening" "data extraction" "abstract screening 1" vs "abstract screening 2". ...... .....

There is no "clear nomenclature" or metadata on this, but often we use the word "abstract" in the name of the level to indicate "abstract screening"

The number of "levels" in total (including garbage projects) is: 1226

So in total we have 1226 times , that "humans have decided to exclude x papers out of y"

(sometimes x or y or x and y are 0)

I filtered the levels by the ones which have "abstract" in the "level name". These SHOULD be all about abstract screening, but we might have more.

This leaves then 126 rows.

I just pasted here for you information, some of the "statistics" I get for these.

We can see that the first row is:

|                                      project |                                                                                                    level | References Added | Unreviewed | Some Reviews | Included | Excluded | Conflict | Fully Reviewed | Saved, Unsubmitted |
|----------------------------------------------|----------------------------------------------------------------------------------------------------------|------------------|------------|--------------|----------|----------|----------|----------------|--------------------|
|         AHAW_EFSA-Q-2012-00234_Leishmaniosis |                      Title and abstract screening - Study eligibility form: Title and abstract screening |              961 |          0 |            0 |       84 |      877 |        0 |            961 |                  0 |
|         AHAW_EFSA-Q-2012-00234_Leishmaniosis | Full paper screening - Study eligibility form: Full paper screening of unclear title and abstract papers |                  |          0 |            0 |       23 |       61 |        0 |             84 |                  0 |
|                   AHAW_EFSA-Q-2013-00546_EBL |                                              Title Abstract screening - Title and abstract screening EBL |             5181 |          0 |            0 |      255 |     4926 |        0 |           5181 |                  0 |
|         AHAW_EFSA-Q-2013-00835_leishmaniasis |                                                   relevance - First stage screening (title and abstract) |              182 |          0 |            0 |       14 |      168 |        0 |            182 |                  0 |
|                   AHAW_EFSA-Q-2013-00918_pox |                                                           Screening 1 - POX Screening 1 (title&abstract) |               86 |          0 |            0 |       37 |       49 |        0 |             86 |                  0 |
|                   AHAW_EFSA-Q-2013-01034_PPR |                                                             Screening - PPR Screening 1 (title&abstract) |             1076 |          0 |            0 |      243 |      833 |        0 |           1076 |                  0 |
| AHAW_EFSA-Q-2014-00187- VBD-review-GEOG-DIST |                                             Title and abstract screening - Tittle and abstract screening |              816 |         15 |            0 |      255 |      521 |       12 |            801 |                  0 |
|                   AHAW_EFSA-Q-2015-00160_PED |                                      Title and abstract screening PED - Title and abstract screening PED |             1609 |          0 |            0 |      246 |     1363 |        0 |           1609 |                  0 |
|            AHAW_EFSA-Q-2016-00160_Bluetongue |                                                               Level 1 - Q3 screening title and abstracts |              287 |          0 |            0 |      103 |      184 |        0 |            287 |                  0 |
|                   AHAW_EFSA-Q-2018-00141_ASF |                                                             ASF screening - ASF Title abstract Screening |             1512 |          0 |            0 |       89 |     1422 |        1 |           1512 |                  0 |
|         AHAW_EFSA-Q-2018-00269_AI_Monitoring |                                                      Title abstract screening - Title abstract screening |               47 |         47 |            0 |        0 |        0 |        0 |              0 |                  0 |
|  AHAW_EFSAQ201400187_DACRAH2_GeoDistribution |                                             Title and abstract screening - Tittle and abstract screening |             5433 |          0 |            0 |      982 |     4451 |        0 |           5433 |                  0 |
|       AHAW_EFSA_Q_-2014-00187-VECTORNET-OBJ1 |                                                ti/abstract screening - MIR_Tittle and abstract screening |             1756 |          0 |            0 |      679 |     1077 |        0 |           1756 |                  0 |
|       AHAW_EFSA_Q_-2014-00187-VECTORNET-OBJ2 |                                                               Level 1 - R0_Tittle and abstract screening |              145 |          0 |            0 |      107 |       38 |        0 |            145 |                  0 |
|       AHAW_EFSA_Q_-2014-00187-VECTORNET-OBJ3 |                                                          Level 1 - VecComp_Tittle and abstract screening |              703 |         27 |            0 |      327 |      349 |        0 |            676 |                  0 |
|                  AMU_EFSA-Q-2015-00592_crowd |                                                                 screening - Title and abstract screening |              371 |          0 |            0 |       25 |      346 |        0 |            371 |                  0 |
|                AMU_EFSA-Q-2016-00294_MLT- SR |                                                           Level 1 - LEVEL1 screening title and abstracts |              953 |          0 |            0 |      257 |      696 |        0 |            953 |                  0 |
|      BIOCONTAM_EFSA-Q-2014-00189_QPS2014G+NS |  Title and abstract screening - STEP 1 (Title and/or abstract): GRAM-POSITIVE - NON-SPORULATING BACTERIA |              875 |        113 |          393 |       16 |      353 |        0 |            369 |                  0 |
|       BIOCONTAM_EFSA-Q-2014-00189_QPS2014G+S |       Screening Title and Abstract - STEP 1 (Title and/or abstract): GRAM-POSITIVE -SPORULATING BACTERIA |              447 |          0 |          421 |       17 |        9 |        0 |             26 |                  0 |
|         BIOCONTAM_EFSA-Q-2014-00189_QPS2014V |         Title and Abstract screening - STEP 1 (Title and/or abstract): Viruses used for plant protection |               77 |          0 |           77 |        0 |        0 |        0 |              0 |                  0 |
|         BIOCONTAM_EFSA-Q-2014-00189_QPS2014Y |                                   Title and Abstract screening - STEP 1 (Title and/or abstract):  YEASTS |              488 |          0 |          477 |       11 |        0 |        0 |             11 |                  0 |
|       BIOCONTAM_EFSA-Q-2014-00536_EAEC_Trial |                                                       Title and abstracts - Title and abstract screening |              240 |          0 |          100 |      106 |       34 |        0 |            140 |                  0 |
|        BIOCONTAM_EFSA-Q-2015-00028_DIOX_FARM |                               Level 1 _title and abstract - DIOXIN _ FARM / Title and abstract screening |             4202 |          0 |            0 |      503 |     3699 |        0 |           4202 |                  0 |
|       BIOCONTAM_EFSA-Q-2015-00028_DIOX_NP06C |                                                 Level 1 - RPA_IEH_updated / Title and abstract screening |             6101 |          0 |            0 |     2218 |     3883 |        0 |           6101 |                  0 |
|       BIOCONTAM_EFSA-Q-2015-00028_DIOX_NP07C |                                       Level 1 - DIOXIN _TOXICOLOGY MODELS / Title and abstract screening |             4906 |          0 |            0 |      633 |     4273 |        0 |           4906 |                  0 |

So one contribution to you could be the (at least 126) abstract screenings from our database, including its meta-data:

Some might be "half done", but that you could see from the numbers of "total papers", "included", "excluded" , "conflict". I would say "nearly all" are complete.

I have automated all extractions, so the "volume of SRs" does not make any difference for me.

behrica commented 4 years ago

A complete different way to "select" a contribution for you, would be that my colleges start by selecting manually the "projects of interest / relevance / representative", and making sure manually that they indeed contain a "level" which was an abstract screening. (and probably look at the concrete questions asked in the SR and decide based on ......) to share or not to share with you

This would very likely result in a far smaller number, maybe 10 (compared to > 126 of my "take all" approach I described before)

Maybe even both contributions can be usefull....

Please provide me with your comments.

behrica commented 4 years ago

In the "take all" scenario we would talk about at least: ~ 150000 references of which ~ 130000 where excluded.