Closed susheelbhanu closed 3 years ago
Working directory: /mnt/md1200/epfl_sber/massimo/EUCI_MG/selection_inference/sbusi/clusters
Using the file: clusters_min_9_seq_2_samp.tsv
to get the cluster_list as follows:
cd /mnt/md1200/epfl_sber/massimo/EUCI_MG/selection_inference/sbusi/clusters
python
import pandas as pd
df=pd.read_csv("clusters_min_9_seq_2_samp.tsv", sep='\t')
df.drop(df.columns[0], axis=1, inplace=True)
gp=df.groupby('ClusterID')
gp_edited=gp[['Sequence']]
gp_edited.apply(lambda x: x.to_csv(str(x.name) + '.txt', sep='\t', header=False, index=False))
rename "Cluster" "Cluster_" *.txt
ls -1 *.txt | sed 's/Cluster_//g' | sed 's/.txt//g' > cluster_list
Run started at 2020-02-16:15:45:00
The missing components from here to be addressed outside of snakemake as downstream analyses
unannotated
clustersTodo: