cambDI / camb

GNU General Public License v2.0
14 stars 13 forks source link

StandardiseMolecules C++ error #5

Open allaway opened 6 years ago

allaway commented 6 years ago

Hello,

I have an SDF with about 300k molecules. I'd like to standardize all of the molecules. When I run StandardiseMolecules("Data/db_prelim.sdf", "Data/db_standardized.sdf", removed.file = "Data/db_removed.sdf")

The function works but hits a roadblock (see below) about 9/10ths of the way through the SDF. Is this a known error, and can I do anything to resolve it?


[1] "Standardising Structures: Reading SDF (R)"
....................................................................................................5000+
....................................................................................................5000+
removed for brevity
....................................................................................................5000+
....................................................................................................5000+
terminate called after throwing an instance of 'std::logic_error'
  what():  basic_string::_M_construct null not valid
.................................................................................Aborted (core dumped)

``
isidroc commented 6 years ago

Hello, Thanks for your interest in camb. We have not tested that function with such a number of molecules. However, it would be good to know what happens if you apply it to e.g. the last 100K. That would let us know whether the problem is only on the function or related to strange SDF records in your file. Thanks a lot for your help, Isidro

allaway commented 6 years ago

Thanks for the quick reply!

I tried a couple of things. I applied the function to the last 60k of my SDF and it ran without any issues.

I also tried a different approach using molvs, that standardization function choked on a few (30 or so of >300k) molecules. I removed all of those molecules and and ran both molvs and camb standardization functions - molvs worked fine, but camb hung up and spit out the same error described above.

So I suspect it does have more to do with the size of the structure file than the contents. Let me know if there are any other details that I can provide. I am doing this all on an m4.2xlarge EC2 instance (8 vCPU, 32GB memory) so I (hope) it's not just an out of memory issue.

Cheers, Robert

isidroc commented 6 years ago

Thanks, Robert. We usually applied the function to much smaller compound sets. You might be correct in your assumptions about the underlying issue..