Open Rohit-Satyam opened 2 years ago
I wrote a short code in R to do that:
library(dplyr)
library(plyr)
library(stringr)
t <- read.csv("file.tsv",sep = "\t")
rm <- c("cellular_component","biological_process","localization","cell part","molecular_function","intracellular part","cellular process","binding","biological regulation","cytoplasmic part","organelle part","regulation of biological process","chromosomal part")
t <- t[!t$description %in% rm,]
t$description <- paste0(t$description," (",t$confidence,")")
t <- t[,-3]
temp <- t %>%
group_by(sequence_name) %>%
dplyr::summarize(label=str_c(predicted_label,collapse=","),desc = str_c(description, collapse = ", "))
write.table(temp,"filenew.tsv",sep = "\t")
Hi!! Thanks for this easy to install and easy to use CLI tool for functional identification. I have few queries:
proteinfer
with--num_ensemble_elements 5
. I tested this parameter for few well annotated protein in my organism I saw that using5
rather than default decreased the probability of the actual function. When do you recommend usingensemble
parameter and when default works just fine?