deevdevil88 commented 4 years ago

Hi this is a continuation of the #202 .Where you suggest how to extract regulons from a csv file. Thank you, but what if i have a pickled file which i get after running df2regulons on the motif file generated from CLI version. As df2regulons output is directly saved as a pickled file and not a csv. So how would I extract the regulons and regulon genes from the pickled file or is there a way to save the df2regulon output as a csv.

Thanks Devika

cflerin commented 4 years ago

Hi @deevdevil88 ,

It should be the same solution as #202 , you can use the same function to load the regulon file in any format:

sig = load_signatures('signatures.dat')

Hope that helps. I'll close this but feel free to ask further if something isn't clear.

akramdi commented 4 years ago

Hi @deevdevil88 , @cflerin,

load_signatures does not seem to work for pickled files:

In [85]: sig = load_signatures('regulons.p')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-85-ff376c949042> in <module>
----> 1 sig = load_signatures('regulons.p')

~/ENV/env3/lib/python3.6/site-packages/pyscenic/cli/utils.py in load_signatures(fname)
    185             return pickle.load(f)
    186     else:
--> 187         raise ValueError("Unknown file format \"{}\".".format(fname))
    188 
    189

Here's what I did to export regulons and target genes into a csv file after running df2regulons (inspired by #16):


with open("regulons.p", "rb") as f: regulons = pickle.load(f)   

#convert regulon to dict
rdict={}
for reg in regulons:
    targets = [ target for target in reg.gene2weight ]
    rdict[reg.name] = targets

#Optional: join list of target genes 
for key in rdict.keys(): rdict[key]=";".join(rdict[key])

#Write to csv file
with open("regulons.csv",'w') as f:
    w = csv.writer(f)
    w.writerow(["Regulons","Target_genes"])
    w.writerows(rdict.items())

The ouputs looks like this:

Regulons    Target_genes
ARID3A(+)   LINC01140;ATP6V1D;PCDHB5;NLE1;HIF1AN;SLC4A8;FAF2;SF3A3;IP6K1;RFFL;LRIG2;SUFU;ROR1;DCTPP1;PCDHB8;NR3C1;SEC24B;RSAD1;RGS4;ZW10;FAM171A1;FOXJ3;ZNF398;IDH3G;STARD7;HAX1;ZNF814;ADPRHL2;RPL39L;MCM3;ADAMTS7;DLX6;MTERF1;CASP7;CTSA;C1orf220;GRK2;MRPL39;TBC1D9B;NPDC1;IGIP;CD63;SYNM;VEZT;RALGDS;SH2B3;RABAC1;LRWD1;RAB5C;TBCB;MRPS16;PAN2;HS1BP3;ZFAND4;NTMT1;CPD;GTF2I;DEXI;RNF123;POLR2H;DCAF15;COPZ1;CATSPER2;FAM89B;RALY;CARNMT1;PALM;POR;MFSD2B;TGFB1;GATB;TAGLN2;SKIDA1;BCL11A;ZBTB20;CHST7;FRAT2;TSHZ1;ARL8A;AK9;ACTR3B;TMEM14A;FARSB;C12orf65;FAHD2B;RGS17;EEF2K;ZNF593;PSMB1;CHRNB4;CFAP65;WDR4;ARID3A;GMEB2;PMEPA1;ZBTB14;TMEM60;MICAL1;TRIP4;PDLIM3
ARNT(+) SYNRG;RRAGB;HSF2;TSKU;BCAS3;TMEM131;LTA4H;CFL2;STAT3;ZDHHC9;CTBP2;RRAGC;ANKRD37;ALDH3A2;MBTD1;DEPDC7;NFATC3;YTHDF1;E2F1;TOR3A;TNRC18;MYL12A;AFF4;DIAPH1;IPO13;SDE2;ANAPC13;CDKN1B;UBE2H;WDR75;ZFP62;RNFT2;SMIM7;PPIL4;POLR2M;PKDCC;PPT2;KIAA0753;CKAP5;PUSL1;RADIL;SORBS1;PANX1;CDC16;PDE4DIP;LIPE-AS1;TWF2;ZNF708;INCENP;ZNF429;NEURL2;TRAPPC6A;ZBTB25;AP5Z1;STAG3;ZNF296;RREB1;FAM76A;SPPL3;ATP6V1H;BAX;GTPBP1;PKN1;VPS26A;ARMCX6;CBX6;FBXO10;ABRACL;PLD3;CYTH3;ARMCX4;SLC38A2;DLEU2;DDX5;UTP14A;ZNF512;COQ10A;FMR1;RABGAP1;TMEM42;PRDM4;ATF7IP;SPG21;KDM6B;FARSA;RCOR2;BROX;GPR176;ACAP3;NUDCD3;TCEAL9;JUN;ANO6;LAMTOR3;SUPT3H;POMGNT2;CBL;ELMSAN1;RNF185;TNRC6B;PLS1;CPNE2;APBA1;RRBP1;FRMD6;CYP2R1;MSANTD3;TP53BP2

This is the best I could come up with to have regulons and target genes in a readable file.. I hope it helps!

Amira

deevdevil88 commented 4 years ago

Hi Amira @akramdi , this is the code we used to get a csv file Transform reg.csv output from pipeline to regulons (in reg.csv, each TF can

be listed multiple times)

Create regulons from a dataframe of enriched features

df_motifs = load_motifs(args.ctx_output) regulons = df2regulons(df_motifs)

Pickle these regulons

with open("pyscenic_results.dir/regulons.P", 'wb') as f: pickle.dump(regulons, f)

Output regulons as a csv file

regulon_df = pd.DataFrame(columns = ["regulon_name", "transcription_factor", "genes", "weights", "score", "context"]) for i in range(len(regulons)): regulon = regulons[i] regulon_dict = dict({"regulon_name": regulon.name, "transcription_factor": regulon.transcription_factor, "genes": list(regulon.genes), "weights": list(regulon.weights), "score": regulon.score, "context": list(regulon.context)}) regulon_df = regulon_df.append(regulon_dict, ignore_index = True)

regulon_df.genes = regulon_df.genes.apply(lambda x: ", ".join(x)) regulon_df.weights = regulon_df.weights.apply(lambda x: str(x)) regulon_df.weights = regulon_df.weights.apply(lambda x: x.replace('[', '')) regulon_df.context = regulon_df.context.apply(lambda x: ", ".join(x))

regulon_df.to_csv("pyscenic_results.dir/regulons.csv", index = False)

hope this helps. Devika

akramdi commented 4 years ago

Great! Thanks for sharing!

Best,

wangjiawen2013 commented 2 years ago

Some time s the strings are too long to save as csv, and must be saved as xlsx. @deevdevil88

aertslab / pySCENIC

Extract regulons from a pickled file[results] #206

be listed multiple times)

Create regulons from a dataframe of enriched features

Pickle these regulons

Output regulons as a csv file