ghg296 / orthagogue

Automatically exported from code.google.com/p/orthagogue
Other
0 stars 0 forks source link

std::bad_alloc #4

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. running orthAgogue on a large (20 Gb) blast output file

What is the expected output? What do you see instead?
the expected output is the default orthAgogue output (i.e. *.abc *.mci files)
what I see instead:

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Aborted (core dumped)

What version of the product are you using? On what operating system?
version = orthAgogue-1.0.2 (MPI, debian)
OS = ubuntu 12.04

Please provide any additional information below.
I only encounter issues when using very large datasets.
For instance, I have a set of 267 bacterial genomes, each genome containig 
~2700 proteins (totalling ~728000 proteins). When I blast these proteins 
against themselves to build an all-vs-all blast file, I get a final blast 
output file of roughly 20 Gb. And when trying to run orthAgogue on this 20 Gb 
file I get the above error. Any idea what's going on? Am I running out of CPU 
memory? Thanks!

Original issue reported on code.google.com by markdeb...@gmail.com on 3 Feb 2014 at 4:19

GoogleCodeExporter commented 9 years ago
Hi,

Thanks for your report!
-- Your error seems interesting, i.e. I'm looking forward fixing it!

There are several possible sources of this error:
(1) memory leaks, ie, that reserved/allocated memory were not properly freed;
(2) a row in your blast-file which did not contain what orthAgogue expected;
(3) that you've passed the memory threshold.

From what you are reporting, I hope that the error is found not found in (3). I 
do not expect the memory threshold to be passed, as: 
-- the proteins 'static' memory consumption will be less than 728,000*100B ~ 
10MB
-- the proteins 'dynamic' memory consumption (containing the set of blast 
pairs) will be written into files when a threshold is passed. This threshold is 
defined by the memory size of each computer. 

From this I suspect that your error is found in a 'difficult-to-spot' location 
of the source code, i.e. that it would be cgreat if you would help me. To help 
investigating such problems, we have a test-rich code which is activated when 
the "install_debug.bash" script is used for installing the software. Therefore, 
as a first step to fix the problem, may you install the tool using the 
"install_debug.bash" script?
-- If you would, I hope we will be able to identify where in the source code 
where the error actually happens, i.e. that it would be an easy task fixing the 
bug. 

Looking forward hearing more from you!

Ole Kristian Ekseth, 
developer of orthAgogue

Original comment by oeks...@gmail.com on 3 Feb 2014 at 5:24

GoogleCodeExporter commented 9 years ago
Hi Ole,
I'm a bit puzzled.
I donwloaded all the source codes and ran the install_debug.bash script.
After that, I tried again to run orthagogue on the large dataset and again got 
the same error report:

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Aborted (core dumped)

I don't see any more detailed error report. Is it perhaps dumped somewhere and 
am I looking in the wrong place for the report?

Best,

Mark

Original comment by markdeb...@gmail.com on 6 Feb 2014 at 9:46

GoogleCodeExporter commented 9 years ago
Hi,

Thanks for your answer. It was not my intention to make you puzzled: I 
apologize!
-- The lack of reported errors is interesting, as it states that you have 
discovered an unknown case, i.e. a case which was not part of my imagination. 
(If it had, then you would have seen a detailed explanation for the case in 
your error report, i.e. if there had been any.) 
-- I am utmost thankful for the willingness to help us tracking the bug: thanks!

From your answer, I managed to re-produce your error:
-- In the estimation of available memory (on your computer), a try-catch block 
is not properly handled. I'm now working through the code, to smoke out other 
bugs (i.e. if any).
-- When the error is fixed, I will give you an update. Hope having a new 
release before the end of next week, though if you are close to a deadline, 
give a word, and I will (try to) complete the update (i.e. bug-fix) during this 
weekend instead: you have already been to nice towards me, which I'm thankful 
for. 

Again, thanks for your help!

Ole Kristian

Original comment by oeks...@gmail.com on 6 Feb 2014 at 12:01

GoogleCodeExporter commented 9 years ago
Hi Ole,
Thanks for your swift reply.
Yes, please let me know once you've fixed this. I am very eager to run your 
program on my dataset :-)
Best,
Mark

Original comment by markdeb...@gmail.com on 6 Feb 2014 at 2:11

GoogleCodeExporter commented 9 years ago
Dear Ole,
Just curious: did you have any luck finding the supposed bug?
FYI: I checked one of the potential reasons you indicated that might cause the 
error report, i.e. "(2) a row in your blast-file which did not contain what 
orthAgogue expected", but it seems that my Blast output file is correct as it 
has the right nr of columns in every single line and protein IDs in the 1st and 
2nd column.
Looking forward to your reply,
cheers,
Mark

Original comment by markdeb...@gmail.com on 25 Feb 2014 at 11:40

GoogleCodeExporter commented 9 years ago
Hi,

Many thanks for your inquiry: the bug is identified, though due to a flue I've 
been prevented (until now) from fixing it. 
-- I hope to be fully back-on-track in the next days, though the life of 
viruses is sometimes hard to predict.

Before the flue caught me, the main problem/challenge was identified (which was 
more complex than the initial symptom observed in your version of orthAgogue). 

Summary:
To improve the performance, orthAgogue delays file-writing until it is strictly 
necessary. "Strictly necessary" is a vague term, and it is here the challenge 
is found, i.e. to identify the maximum number of blastp-rows to be kept into 
memory before the performance (of orthAgogue) decreases. (In brief, we are here 
concerned with how physical memory relates to virtual swap-memory on the 
computer.)

Steps taken:
-- What complicated the issue, was that my test-platform (a Dell Latitude 
E6510) got overheated when executing orthAgogue in debug-mode on the 24GB file. 
Therefore, I've been using a server (with 120GB physical memory). For this 
"tweak" to re-produce the error, temporary modifications have been made to the 
memory scheduling of orthAgogue. The challenge was (and is) to make these 
temporary modifications reflect the (approx) 4GB physical memory limitations of 
laptops (e.g. my Dell Latitude).

Road ahead:
If the above assumptions are correct, the 'bug fix' should be completed in a 
manner of days, i.e. assuming that the virus concludes that I'm not worthy of 
its (i.e. virus') company.
-- as you have already waited to long (I apologize!), I may 'execute' the file 
for you. If you do find this (backup) option of interest, may you then upload 
your file to a publicly accessible location (e.g. a web-server)?

Hope this (i.e. the delay caused by the flue and my laptops overheating) has 
not been a burden in your work: thanks for your help!

Ole Kristian

Original comment by oeks...@gmail.com on 25 Feb 2014 at 12:37

GoogleCodeExporter commented 9 years ago
Dear Ole,
Thank you very much for this update. Viruses and overheating laptops: that does 
not sound good ;-)
No worries, this has not been a burden to my work at all (lot's of other things 
to do :-) but I might consider your backup option if you don't mind. Would it 
be an option for you if I send you the protein-db, which needs to be BLASTed 
against itself to generate the 20 Gb file for orthAgogue? This would be easier 
for me than transferring the 20 Gb file over the internet for which I currently 
don't see a straightforward possibility (we have some ftp-servers here, but I 
do not have 20 Gb storing capacity on them, unfortunately).
Let me know what you think of this.
Hope you get well soon.
Best regards,
Mark

Original comment by markdeb...@gmail.com on 26 Feb 2014 at 9:45

GoogleCodeExporter commented 9 years ago
Hi,

Thanks for your willingness to try the 'alternative path', which makes it a lot 
easier for me: Then I'll be able to focus on quality, and not only 'debug fix 
time, which I'm thankful for!
-- Regarding your suggestion, constructing a blastp-file of 20GB size takes 
(given my own knowledge) considerable longer time than executing orthAgogue. I 
will therefore send by email a location of a temporary storage, i.e. give a 
word if you do not receive it (or has any troubles in this context accessing 
it).
-- If you do not receive an email from me in the next hour, give a word (as I 
then might have misspelled your gmail-address). 

PS: will try keeping this post weekly updated until the error is fixed, i.e. 
when laptops with 4GB of physical memory may run orthAgouge on blastp-files 
exceeding 20GB.

Ole Kristian

Original comment by oeks...@gmail.com on 26 Feb 2014 at 10:23

GoogleCodeExporter commented 9 years ago
A short sitrep describing the newest commit (though still not compiled new 
binaries):
-- increased the degree of dynamic file-writing (which reduced memory 
consumption).
-- to avoid orthAgogue from using too much of your resources, define the memory 
threshold using "ulimit <arguments>".
-- to detect the location of where (and why) memory usage passes the memory 
threshold, we've added extra try-catch blocks (to detect memory overflows).

Given the high number of possible blastp-permutations, memory overflows may 
still occur. If such case is encountered, please add a new post (either on this 
issue, or a new issue).

Best,

Ole Kristian

Original comment by oeks...@gmail.com on 11 Mar 2014 at 9:15