Closed teresa-m closed 1 year ago
The script should have two start points:
df what colums do we need:
chrom_1st start_1st end_1st strand_1st
interaction_no
chrom_2end start_2end end_2end strand_2end
'score_seq_1st_side', 'score_seq_2end_side',
'biotype_region_1st', 'biotype_region_2end',
test both wrapper with a bigger dataset:
python create_trainings_data.py -i1 /vol/scratch/data/RRIs/Paris/ -g /vol/scratch/data/genomes/hg38_UCSC_20210318.2bit -r ChiRA_interaction_HEK293T_1.tabular ChiRA_interaction_HEK293T_2.tabular ChiRA_interaction_HEK293T_3.tabular -o /vol/scratch/data/test/ -c 100 -n test_paris_human -l /vol/scratch/data/genomes/hg38_Info.tab
python evaluate_instance.py -i1 /vol/scratch/data/RRIs/SPLASH_without_hybrids/ChiRA_interaction_summary_ hES_RA_2.tabular -i2 none -g /vol/scratch/data/genomes/hg38_UCSC_20210318.2bit -o /vol/scratch/data/test/ -c 150 -n test_data -l /vol/scratch/data/genomes/hg38_Info.tab
Generate wrapper, which can handle as input sequences or a given interaction. Generates features for it and then calls a model. In the end, we would have a model file, an input- and used-features file, and the wrapper script.