Mass23 / NOMIS_ENSEMBLE

3 stars 1 forks source link

EUCI: Unannotated clusters #10

Closed susheelbhanu closed 3 years ago

susheelbhanu commented 3 years ago

Todo:

susheelbhanu commented 3 years ago
susheelbhanu commented 3 years ago

Working directory: /mnt/md1200/epfl_sber/massimo/EUCI_MG/selection_inference/sbusi/clusters

susheelbhanu commented 3 years ago

Using the file: clusters_min_9_seq_2_samp.tsv to get the cluster_list as follows:

cd /mnt/md1200/epfl_sber/massimo/EUCI_MG/selection_inference/sbusi/clusters
python
import pandas as pd
df=pd.read_csv("clusters_min_9_seq_2_samp.tsv", sep='\t')
df.drop(df.columns[0], axis=1, inplace=True)
gp=df.groupby('ClusterID')
gp_edited=gp[['Sequence']]
gp_edited.apply(lambda x: x.to_csv(str(x.name) + '.txt', sep='\t', header=False, index=False))
rename "Cluster" "Cluster_" *.txt
ls -1 *.txt | sed 's/Cluster_//g' | sed 's/.txt//g' > cluster_list
susheelbhanu commented 3 years ago

Run started at 2020-02-16:15:45:00

susheelbhanu commented 3 years ago

The missing components from here to be addressed outside of snakemake as downstream analyses