fsingletonthorn / EffectSizeScraping

MIT License
1 stars 0 forks source link

Set up SQL database #9

Open fsingletonthorn opened 5 years ago

fsingletonthorn commented 5 years ago

Decide on tables to go in database – currently:

• Metadata o PMCID [character] (Pub Med Central ID, used to ID article in each table) o doi [character] (doi) o journalID [character] (journal name) o journalIDAbrev [character] (abbreviated journal ID) o title [character] (title of article) o issue [numeric] (issue number) o volume [numeric] (volume number) o pPub [date] (print publication date) o ePub [date] (electronic publication date) o call [character] (URL of call to XML of article)

• Authors o PMCID [character] o firstName [Character] (author first name) o lastName [Character] (author last name)

• Keywords o PMCID [character] o keywords [character] (keywords)

• Statistics o PMCID [character] o section [character (one of: abstract, intro, methods, discussion, results, unlabelled)] (the section from which the statistic was separated into) o statistic [character (one of: t, F, r, chi, d, eta, HR, OR)] (the statistic extracted from the paper, t statistic, F statistic, r (correlation coefficient), chi squared (chi), Cohen’s d, eta (eta squared, eta, partial eta, omega squared, epsilon squared), Hazard Ratio (HR), Odds Ratio (OR) o cleaned [character] (the extraction section from the paper, all whitespace removed) o reported [character] (the extraction section from the paper, copied exactly as reported) o context [character] (100 words around the extracted statistic) o value [numeric] (test statistic or effect size value) o df1 [numeric] (effect degrees of freedom for F tests, NA otherwise) o df2 [numeric] (residual / error degrees of freedom) o p [character (may include “<” or “>” if reported, equal signs removed)] (p value for each statistical test)