Open abu034004 opened 6 years ago
Hi Abu,
Thank you for your interest in using our package and for reporting this issue.
The problem you are facing is an R characteristic, not specific to the package. Since you read the csv file using the function read.csv, it tries to infer the class types of each column and because your output column has no decimals [only 1,-1], it gave it a class of type "integer" and that caused the error. So, changing that column class type to numeric should do the trick. We will try to update the package also to check for that and maybe do that implicitly.
In short, just add the following line after the read.csv line:
library(mRMRe)
df <- read.csv("gene.csv", header = TRUE)
sapply(df, class) # this will show you the classes of all columns in df
df <- transform(df, X.Output. = as.numeric(X.Output.)) # this will change the output column class into "numeric"
sapply(df, class) # to check that the change is in effect
f_data <- mRMR.data(data = data.frame(df))
featureData(f_data)
mRMR.ensemble(data = f_data, target_indices = 7,
feature_count = 2, solution_count = 1)
As for what should be the value of target_indices
, it can be value if you have one target representing the index of the target column in your df.
Please let us know if this solves your problem or if you face any other problem.
Best, Wail
Hi, As per suggestion in the email reply of Dr. Benjamin Haibe-Kains, I am creating an issue regarding my query. Please excuse if the question is simple as I am new in R. Below is the detail.
Suppose, I have a csv file gene.csv (CSV file is attached as zip file- gene.zip) having feature set of 6 attributes (
[G1.1.1.1]
,[G1.1.1.2]
,[G1.1.1.3]
,[G1.1.1.4]
,[G1.1.1.5]
,[G1.1.1.6]
) and a target class variable[Output]
('1' indicates positive class and '-1' stands for negative class). Here's the samplegene.csv
file (see attached zip file):I am trying to get best feature subset of 2 attributes (out of above 6 attributes) and wrote following R code.
When I run this code, I am getting following error for the statement
f_data <- mRMR.data(data = data.frame(df))
:However, my data in each column of the csv file are real number. So, how can I change the R code to fix this problem? Also, I am not sure what should be the value of
target_indices
in the statementmRMR.ensemble(data = f_data, target_indices = 7, feature_count = 2, solution_count = 1)
as my target class variable name is "[Output]" in the gene.csv file.I will appreciate much if you kindly help me to obtain the best feature subset based on the gene.csv file using your mRMRe R package.
Thank you very much.
Sincerely, Abu