We tested the following pipeline using Python 3.8 or 3.9 on Ubuntu Ubuntu 16.04.7 LTS and Red Hat Enterprise Linux Server 7.9 (Maipo). GPU is not essential to run the pipeline but highly recommended to accelerate the process. It usually takes ~30min to install all the requirements using Conda.
To install the requirements, you can use the env file:
conda env create -f env/protease-pipeline.yml
conda activate protease-pipeline
protease-pipeline.yml
STEP1_module.ipynb
STEP1_run.ipynb
STEP2_run.ipynb
STEP3_run.ipynb
STEP4_module.ipynb
STEP4_run.ipynb
STEP5_module.ipynb
STEP5_run.ipynb
Raw_NGS_counts_overlapped_seqs_STEP1_lib1_lib2.csv
Raw_NGS_counts_overlapped_seqs_STEP1_lib2_lib3.csv
Raw_NGS_counts_overlapped_seqs_STEP1_lib1_lib4.csv
Raw_NGS_counts_overlapped_seqs_STEP1_lib2_lib4.csv
Raw_NGS_counts_overlapped_seqs_STEP1_lib3_lib4.csv
STEP1_out_protease_concentration_trypsin
STEP1_out_protease_concentration_chymotrypsin
Using GPU, it will take ~30min for STEP1, ~1 hr for STEP2, ~10 hrs for STEP3, ~30min for STEP4, and ~1 hr for STEP5.
Burial_side_chain_contact_Fig3_Fig6.ipynb
Additive_model_Fig4.ipynb
Classification_model_Fig5.ipynb
Data_quality_filtering_script.ipynb
Raw_qPCR_data_FigS1.csv
Process_qPCR_data.ipynb
NGS_count_lib1.csv
NGS_count_lib2.csv
NGS_count_lib3.csv
NGS_count_lib4.csv
K50_dG_lib1.csv
K50_dG_lib2.csv
K50_dG_lib3.csv
K50_dG_lib4.csv
K50_dG_Dataset1_Dataset2.csv
K50_Dataset3.csv
Single_DMS_list.csv
Double_DMS_list.csv
Triple_DMS_list.csv
Heat_maps_single_DMS.pdf
Heat_maps_double_DMS.pdf
dG_extdG_data_Fig1.csv
dG_site_feature_Fig3.csv
dG_for_double_mutants_Fig4.csv
dG_non_redundant_natural_Fig5.csv
dG_GEMME_non_redundant_natural_Fig6.csv
Raw_qPCR_data_FigS1.csv
Process_qPCR_data.ipynb
STEP1_module.ipynb
STEP1_run.ipynb
STEP2_run.ipynb
STEP3_run.ipynb
STEP4_module.ipynb
STEP4_run.ipynb
STEP5_module.ipynb
STEP5_run.ipynb
Raw_NGS_counts_overlapped_seqs_STEP1_lib1_lib2.csv
Raw_NGS_counts_overlapped_seqs_STEP1_lib2_lib3.csv
Raw_NGS_counts_overlapped_seqs_STEP1_lib1_lib4.csv
Raw_NGS_counts_overlapped_seqs_STEP1_lib2_lib4.csv
Raw_NGS_counts_overlapped_seqs_STEP1_lib3_lib4.csv
K50_scrambles_for_STEP3.csv
STEP1_out_protease_concentration_trypsin
STEP1_out_protease_concentration_chymotrypsin
STEP3_unfolded_model_params
Burial_side_chain_contact_Fig3_Fig6.ipynb
Additive_model_Fig4.ipynb
Classification_model_Fig5.ipynb
eehh_EA_GBB_AGBB.bp
eehh_GG_GBB_AGBB.bp
eehh_XX_XXX_XXXX.bp