WeberLab-UW / OSH_Review

0 stars 0 forks source link

Cleaning steps #6

Closed nniiicc closed 2 months ago

nniiicc commented 3 months ago

@sarah114tran - documenting data collection and some preliminary cleaning of the dataset here.

Two new directories in the repo:

Next steps:


Query:

For Google Scholar query:

Total of 3474 observations

Applying inclusions and exclusion criteria

Date Removed out of range - 661

Deduplication: Original number of rows: 2813 Number of duplicates identified by title: 959 Number of rows after title de-duplication: 2241 Number of duplicates identified by DOI: 473 Final number of rows after DOI de-duplication: 1802

Crossref does not allow for search by type... so clean by type (removed following)

Final dataset: 1692

sarah114tran commented 2 months ago

@nniiicc -- I have decided to drop all of the articles with empty abstracts, because it can be hard to tell with just title will document in the coding notes/protocol I would drop the null values in the in the "abstract" column in the dedup_clean.csv before combining it with your previously coded dataset, because the capitalization of each column label in not exact in both datasets.