OstfriesenBI / PredmiRNA

A set of scripts and tools to train a classifier for pre-miRNA Recognition
1 stars 0 forks source link

R Script: Convert Fasta File to csv file #3

Closed Finesim97 closed 5 years ago

Finesim97 commented 5 years ago

This code may be useful, Biostrings has to be installed, which can be done with conda. https://stackoverflow.com/a/32770831 https://bioconda.github.io/recipes/bioconductor-biostrings/README.html It is possible to change the names of the columns in a dataframe with names(). The code should be in a function, which takes an input file, an output file and a parameter which will be added to all entries in the column "realmiRNA", which can be 0 or 1.

transferFastaTo <- function(in_path, out_path, real) {
    # code here
    write.csv(dataframe,out_path,row.names=FALSE)
}

Example input file:

>mmu-mir-380 MI0000797 Mus musculus miR-380 stem-loop
AAGAUGGUUGACCAUAGAACAUGCGCUACUUCUGUGUCGUAUGUAGUAUGGUCCACAUCU
U
>mmu-mir-381 MI0000798 Mus musculus miR-381 stem-loop
UACUUAAAGCGAGGUUGCCCUUUGUAUAUUCGGUUUAUUGACAUGGAAUAUACAAGGGCA
AGCUCUCUGUGAGUA

Function gets called with real=1: Output file:

"comment","sequence","realmiRNA"
"mmu-mir-380 MI0000797 Mus musculus miR-380 stem-loop","AAGAUGGUUGACCAUAGAACAUGCGCUACUUCUGUGUCGUAUGUAGUAUGGUCCACAUCUU",1
"mmu-mir-381 MI0000798 Mus musculus miR-381 stem-loop","UACUUAAAGCGAGGUUGCCCUUUGUAUAUUCGGUUUAUUGACAUGGAAUAUACAAGGGCAAGCUCUCUGUGAGUA",1
mariusrueve commented 5 years ago

Commited my first result in 6154ddc0d9820efe888fd93269496599549f5f46, but I still have to do further testing and maybe change some lines.

mariusrueve commented 5 years ago

Final result 99b6946.