Hi there, I opened a issue in Rcpi package, however, I think it's more appropiate to open a issue here because Rcpi relies on ChemmineOB.
I have ~50k molecules, of which I want to calculate fingerprints. I use a function from Rcpi package (see code below), which create an appropiate matrix to store fingerprints, then iterate over molecules and finally calculate the fingerprints using ChemmineOB::fingerprint_OB. However, in each loop, RAM usage increases, despite the fact that the size of the matrix is constant. I noticed that R session memory is increasing, not the object size.
I calculated the fingerprints using open babel cli, and it runs smoothly.
Thanks, and sorry about my english.
function (molecules, type = c("smile", "sdf"))
{
check_ob()
if (type == "smile") {
if (length(molecules) == 1L) {
molRefs = eval(parse(text = "ChemmineOB::forEachMol('SMILES', molecules, identity)"))
fp = eval(parse(text = "ChemmineOB::fingerprint_OB(molRefs, 'FP4')"))
}
else if (length(molecules) > 1L) {
fp = matrix(0L, nrow = length(molecules), ncol = 512L)
for (i in 1:length(molecules)) {
molRefs = eval(parse(text = "ChemmineOB::forEachMol('SMILES', molecules[i], identity)"))
###########################################################
####### This is the step which increases RAM usage in each loop step
fp[i, ] = eval(parse(text = "ChemmineOB::fingerprint_OB(molRefs, 'FP4')"))
###########################################################
}
}
}
else if (type == "sdf") {
smi = eval(parse(text = "ChemmineOB::convertFormat(from = 'SDF', to = 'SMILES', source = molecules)"))
smiclean = strsplit(smi, "\\t.*?\\n")[[1]]
if (length(smiclean) == 1L) {
molRefs = eval(parse(text = "ChemmineOB::forEachMol('SMILES', smiclean, identity)"))
fp = eval(parse(text = "ChemmineOB::fingerprint_OB(molRefs, 'FP4')"))
}
else if (length(smiclean) > 1L) {
fp = matrix(0L, nrow = length(smiclean), ncol = 512L)
for (i in 1:length(smiclean)) {
molRefs = eval(parse(text = "ChemmineOB::forEachMol('SMILES', smiclean[i], identity)"))
fp[i, ] = eval(parse(text = "ChemmineOB::fingerprint_OB(molRefs, 'FP4')"))
}
}
}
else {
stop("Molecule type must be \"smile\" or \"sdf\"")
}
return(fp)
}
Hi there, I opened a issue in Rcpi package, however, I think it's more appropiate to open a issue here because Rcpi relies on ChemmineOB. I have ~50k molecules, of which I want to calculate fingerprints. I use a function from Rcpi package (see code below), which create an appropiate matrix to store fingerprints, then iterate over molecules and finally calculate the fingerprints using
ChemmineOB::fingerprint_OB
. However, in each loop, RAM usage increases, despite the fact that the size of the matrix is constant. I noticed that R session memory is increasing, not the object size. I calculated the fingerprints using open babel cli, and it runs smoothly. Thanks, and sorry about my english.