OxfordDemSci / ICS_Analysis

Mixed methods approach and interactive dashboard to analyse research impact through Impact Case Studies submitted to the UK's Research Excellence Framework (REF) 2021.
https://shape-impact.co.uk
GNU General Public License v3.0
5 stars 0 forks source link

Include functionality to do a clean run of the topic model estimation in make_enhanced_data.py #40

Closed MarkDVerhagen closed 8 months ago

MarkDVerhagen commented 9 months ago

@bz-dev can you check whether these calls to the bert.py and bert_reduce.py align with the way you would run this?

Note the default functionality is to hotload the nn3_threshold0.01_reduced.xlsx but with -top forces a rerun of the model.

if '-top' in sys.argv:
    ## Generate new topic model
    print("Generating new topic model... This will take some time.")
    bert_script_path = project_path / '..' / '..' / 'topic_modelling' / 'bert.py'
    run_args = [
        edit_path / 'clean_ref_ics_data.xlsx', ## "./ICS_Analysis/data/edit/
        topic_path]

    run_command = ["python3", bert_script_path] + run_args
    subprocess.run(run_command)

    print("Reducing topic model... This will take some time.")
    reduce_script_path = project_path / '..' / '..' / 'topic_modelling' / 'bert_reduce.py'
    reduce_args = [
        topic_path,  ## "./ICS_Analysis/data/topic_model/
        'nn3_threshold0.01']

    reduce_command = ["python3", bert_script_path] + reduce_args
    subprocess.run(reduce_command)
bz-dev commented 8 months ago

@MarkDVerhagen @doug-leasure yes this matches the parameters