Closed fsingletonthorn closed 5 years ago
# Remove HTML tags: strings <- lapply(strings, gsub, pattern = "<(.|\n)*?>", replacement = "")
Probably rarely causes issues, although can if people are reporting p values as >/<, e.g. "t(130) = 12.4, p < .05 and t(130) = 0.13, p > .05", which would get stripped to "t(130) = 12.4, p .05"
E.g., for "https://www.ncbi.nlm.nih.gov/pmc/oai/oai.cgi?verb=GetRecord&identifier=oai:pubmedcentral.nih.gov:5504157&metadataPrefix=pmc" the section reading "Additionally, to analyze stability at the level of the individual, we intercorrelated all variables of t1 with their counterparts at t2. Correlation coefficients were high and, without exception, statistically significant (p < 0.001). Intercorrelations of character strengths at t1 and t2 ranged fromr = 0.56 (authenticity) to r = 0.86 (spirituality), and those of wellbeing aspects from r = 0.32 (autonomy) to r = 0.75 (PWB)."
doesn't appear to be pulled