ContentMine / tilburg

Extraction of data from Vector-based Funnel Plots in the scholarly literature
1 stars 1 forks source link

Open notebook post #18 is unclear #6

Open ghost opened 7 years ago

ghost commented 7 years ago

http://discuss.contentmine.org/t/extracting-data-from-tilburg-funnel-plot-diagrams/386/18 says:

The total number of raw documents is given as:

url,doi,funnel_plot,nr_funnel,vector,note
https://doi.org/10.1002/bimj.201500115,10.1002/bimj.201500115,no,NA,NA,
https://doi.org/10.1002/brb3.482,10.1002/brb3.482,yes,2,no,
https://doi.org/10.1002/cam4.673,10.1002/cam4.673,no,NA,NA,
https://doi.org/10.1002/cam4.676,10.1002/cam4.676,no,NA,NA,
...
https://doi.org/10.7448/IAS.19.1.20888,10.7448/IAS.19.1.20888,no,NA,NA,
https://doi.org/10.7717/peerj.1845,10.7717/peerj.1845,no,NA,NA,
https://doi.org/10.7717/peerj.2063,10.7717/peerj.2063,no,NA,NA,
https://doi.org/10.7717/peerj.2550,10.7717/peerj.2550,no,NA,NA,

It is unclear what this means. @petermr , please can you reword it comprehensibly?

petermr commented 7 years ago

Added clarification on Discourse:

url: the URL doi: the DOI funnel_plot: Y/N are funnel plots present nr_funnel: number of funnel plots vector: are vector graphics used (yes/no.NA) note: notes

ghost commented 7 years ago

@petermr thanks, but that's not what was unclear, sorry.

What was unclear was: how does this represent "The total number of raw documents"? The latter would appear to be an (expression that resolves to an) integer, yet you have given what seems to be an excerpt from a CSV file (probably this CSV file).

petermr commented 7 years ago

That was the original corpus that Chris retrieved of which most is irrelevant to what he sent us. This file can be IGNORED. We are simply analysing 30 PDFs in 2015ori-3/corpus . That is all that is in question. You are not expected to document the process prior to use receiving the PDF files

ghost commented 7 years ago

Thanks, your latest comment makes this marginally clearer.

The post as written still seems to breach the "no insider information" principle of open notebook science, so to resolve this, please can you replace this:

ChrisH has delivered the corpus of ca 30 PDFs for immediate analysis. See https://github.com/chartgerink/2015ori-3 . The total number of raw documents is given as:

url,doi,funnel_plot,nr_funnel,vector,note
https://doi.org/10.1002/bimj.201500115,10.1002/bimj.201500115,no,NA,NA,
https://doi.org/10.1002/brb3.482,10.1002/brb3.482,yes,2,no,
https://doi.org/10.1002/cam4.673,10.1002/cam4.673,no,NA,NA,
https://doi.org/10.1002/cam4.676,10.1002/cam4.676,no,NA,NA,
...
https://doi.org/10.7448/IAS.19.1.20888,10.7448/IAS.19.1.20888,no,NA,NA,
https://doi.org/10.7717/peerj.1845,10.7717/peerj.1845,no,NA,NA,
https://doi.org/10.7717/peerj.2063,10.7717/peerj.2063,no,NA,NA,
https://doi.org/10.7717/peerj.2550,10.7717/peerj.2550,no,NA,NA,

from which 30 PDFs have been selected. The fields are

url: the URL
doi: the DOI
funnel_plot: Y/N are funnel plots present
nr_funnel: number of funnel plots
vector: are vector graphics used (yes/no.NA)
note: notes

with this:

ChrisH is working offline with a corpus of 300+ documents, as listed in this table. Only some of these contain funnel plots, and only some of those store their funnel plots as vector drawings.

In case you wish to view that table, here is a key to its column headings:

url: the URL
doi: the DOI
funnel_plot: Y/N are funnel plots present
nr_funnel: number of funnel plots
vector: are vector graphics used (yes/no/NA)
note: notes

From that original corpus, ChrisH has uploaded 30 PDFs for us to analyse.

Thanks! :+1:

petermr commented 7 years ago

The Tilburg project is not to analyse the literature but to develop software with a selected body of documents. How that body is selected is irrelevant

ghost commented 7 years ago

@petermr wrote:

How that body is selected is irrelevant

Then why is it already mentioned in the open notebook? If it is mentioned, it should be comprehensible. As things stand, the post is incomprehensible without the "insider information" that has emerged from our discussion above.

ghost commented 7 years ago

@petermr wrote:

How that body is selected is irrelevant

Then why is it already mentioned in the open notebook? If it is mentioned, it should be comprehensible. As things stand, the post is incomprehensible without the "insider information" that has emerged from our discussion above.

I'm handing this to you on a plate :) I would be grateful if you would copy-paste my suggested edit above into the post :+1: