OxfordDemSci / ICS_Analysis

Mixed methods approach and interactive dashboard to analyse research impact through Impact Case Studies submitted to the UK's Research Excellence Framework (REF) 2021.
https://shape-impact.co.uk
GNU General Public License v3.0
5 stars 0 forks source link

UOA and Institution ID do not uniquely identify departmental scores #2

Closed MarkDVerhagen closed 1 year ago

MarkDVerhagen commented 1 year ago
raw_results = pd.read_excel(os.path.join(raw_path,
                                             'raw_results_data.xlsx'),
                                skiprows=6)

raw_results = raw_results.rename(
        columns={'Institution code (UKPRN)': 'inst_id',
                 'Unit of assessment number': 'uoa_id',
                 'FTE of submitted staff': 'fte',
                 '% of eligible staff submitted': 'fte_pc'}).astype({'inst_id': 'int', 'uoa_id': 'int'})

raw_results.loc[(raw_results['inst_id'] == 10007794) & (raw_results['uoa_id'] == 26)]

Shows that the University of Glasgow got two sets of results for Unit of Assessment 26: 26A (Modern Languages) and 26B (Celtic and Gaelic).

We need to include Multiple submission letter code to distinguish the two in merging.

MarkDVerhagen commented 1 year ago

Added function to generate a uoa_id that unique identifies department scores (when combined with institution code):

df['uoa_id'] = df['Unit of assessment number'].astype(
        str) + df['Multiple submission letter'].fillna('').astype(str)