Closed gimoAI closed 2 years ago
I think more clean and extensible. No for loops, no asreview dep, less code.
import pandas as pd
df = pd.read_excel("https://osf.io/download/gmjcv/", usecols=['DOI', 'Included_fulltext'])
# adjust columns
df["DOI"] = df["DOI"].str.extract(r"(10.\S+)")
df['id_type'] = 'doi'
# rename columns
df.rename({
'Included_fulltext': 'label_included',
'DOI': 'id'
}, axis=1, inplace=True)
# drop missing ids
df.dropna(subset=["id"], inplace=True)
# export
df.to_csv("Valk_2021_ids.csv", columns=['id', 'id_type', 'label_included'], index=False)
Nice, inplace should be avoided right?
I dont think you have to avoid inplace here. But with Asreview data objects this can have side effects.
Nice!
Add Valk 2022 dataset and code for processing.