The program itself runs properly and I was able to locate RF-HOT.data and RF-HOT-INV.data in my results file. However, the results are in binary and is encoded. How do I open up the RF-HOT.data and RF-HOT-INV.data file and interpret these results further? It would be ideal if I can know the promoter sequences and locations as well as store the results in a Pandas dataframe for further analysis.
Thank you for developing and promoting Promotech.
I was able to run Promotech and got the following result.
CREATING OUTPUT FOLDER: results
PRINTING CONTENT
GENOME: AP009180.1 - LENGTH: 159662
JOINING ALL CHROMS AND SEQS INTO A SINGLE FOR TETRA-NUCLEOTIDE SLIDING WINDOW
JOINED GENOME: AP009180.1 - LENGTH: 159,662
GENERATING PROMOTER SEQUENCES WITH WINDOW-SIZE: 40 AND STEP: 1. EXPECTED SAMPLES: 159,621
100% (159621 of 159621) |################| Elapsed Time: 0:00:00 Time: 0:00:00
CUTTED 40 NT SEQUENCES GENERATED SUCCESSFULLY. # OF SAMPLES: 159,621 = (159621,). SAMPLE #1: ATGAATACTATATTTTCAAGAATAACACCATTAGGAAATG SAMPLE #2: TGAATACTATATTTTCAAGAATAACACCATTAGGAAATGG
CONVERTING 159621 CUTTED 40 NT SEQUENCES TO RF-HOT SEQUENCES USING MAPPING VALUES
CONVERTING DATA 99% (159075 of 159621) |############### | Elapsed Time: 0:00:08 ETA: 0:00:00
HOT ENCODED SEQUENCES GENERATED SUCCESSFULLY.
A G C T A ... T A G C T 0 1 0 0 0 0 ... 1 0 1 0 0 1 0 0 0 1 0 ... 0 0 1 0 0 2 0 1 0 0 1 ... 0 0 0 0 1 3 1 0 0 0 1 ... 1 1 0 0 0 4 1 0 0 0 0 ... 0 0 0 1 0
[5 rows x 160 columns]
RF-HOT SEQUENCES GENERATED SUCCESSFULLY. OUTPUT DATAFRAME SHAPE: (159621, 160)
SAMPLE:
A G C T A ... T A G C T 0 1 0 0 0 0 ... 1 0 1 0 0 1 0 0 0 1 0 ... 0 0 1 0 0 2 0 1 0 0 1 ... 0 0 0 0 1 3 1 0 0 0 1 ... 1 1 0 0 0 4 1 0 0 0 0 ... 0 0 0 1 0
[5 rows x 160 columns]
SAVING FORWARD STRAND HOT-ENCODED SEQUENCES TO BINARY FILE USING JOBLIB TO: results/RF-HOT.data
FILE SAVED SUCCESSFULLY AT: results/RF-HOT.data
GENERATING INVERSE STRAND SEQUENCES. 100% (159621 of 159621) |################| Elapsed Time: 0:00:00 Time: 0:00:00
INVERSE STRAND SEQUENCES GENERATED SUCCESSFULLY. # OF SAMPLES: 159,621. SAMPLE: ORIGINAL : ATGAATACTATATTTTCAAGAATAACACCATTAGGAAATG INVERSE : CATTTCCTAATGGTGTTATTCTTGAAAATATAGTATTCAT
CONVERTING 159621 INVERSE STRAND 40 NT SEQUENCES TO RF-HOT SEQUENCES USING MAPPING VALUES
CONVERTING INVERSE DATA 99% (158962 of 159621) |################################################################################################################################################################################## | Elapsed Time: 0:00:08 ETA: 0:00:00
HOT ENCODED SEQUENCES GENERATED SUCCESSFULLY.
A G C T A ... T A G C T 0 0 0 1 0 1 ... 0 0 0 0 1 1 0 0 1 0 0 ... 0 1 0 0 0 2 1 0 0 0 0 ... 1 0 0 1 0 3 0 0 0 1 1 ... 1 0 0 0 1 4 0 1 0 0 0 ... 0 0 0 0 1
[5 rows x 160 columns]
RF-HOT SEQUENCES GENERATED SUCCESSFULLY. OUTPUT DATAFRAME SHAPE: (159621, 160)
SAMPLE: A G C T A ... T A G C T 0 0 0 1 0 1 ... 0 0 0 0 1 1 0 0 1 0 0 ... 0 1 0 0 0 2 1 0 0 0 0 ... 1 0 0 1 0 3 0 0 0 1 1 ... 1 0 0 0 1 4 0 1 0 0 0 ... 0 0 0 0 1
[5 rows x 160 columns]
SAVING INVERSE STRAND SEQUENCES TO BINARY FILE USING JOBLIB TO: results/RF-HOT-INV.data
FILE SAVED SUCCESSFULLY AT: results/RF-HOT-INV.data
The program itself runs properly and I was able to locate RF-HOT.data and RF-HOT-INV.data in my results file. However, the results are in binary and is encoded. How do I open up the RF-HOT.data and RF-HOT-INV.data file and interpret these results further? It would be ideal if I can know the promoter sequences and locations as well as store the results in a Pandas dataframe for further analysis.
Thanks in advance!