OstfriesenBI / PredmiRNA

A set of scripts and tools to train a classifier for pre-miRNA Recognition
1 stars 0 forks source link

Feature calculation: Dinucleotid frequencies #11

Closed Finesim97 closed 5 years ago

Finesim97 commented 5 years ago

R function: Input: csv file with the sequences:

"comment","sequence","realmiRNA"
"mmu-mir-380 MI0000797 Mus musculus miR-380 stem-loop","AAGAUG",1
"mmu-mir-381 MI0000798 Mus musculus miR-381 stem-loop","AAUUC",1

Output: csv file with the sequence identifier and the frequencies of the 16 different dinucleotid frequencies

"comment",aa,ac,ag,at,ca, ... 
"mmu-mir-380 MI0000797 Mus musculus miR-380 stem-loop", ...
"mmu-mir-381 MI0000798 Mus musculus miR-381 stem-loop", ...

This function already does part of the work, only seqinr needs to be installed.

count(seq, 2,freq=T)

Source Paper: De novo SVM classification of precursor microRNAs from genomic pseudo hairpins using global and intrinsic folding measures

mariusrueve commented 5 years ago

Still a problem in c5269bc but can't figure out how to solve it right now.

mariusrueve commented 5 years ago

Solved the last Problem. Now trying to find an elegant way to write those results to a .csv file.