Closed johnlees closed 5 years ago
Hi John,
I'm a developer of GATB, and a colleague of mine reported this bug in pySeer to me. Seems that you're trying to open a .txt file as a HDF5 file, please let me know if you'd like help in debugging it.
Best, Rayan
Hi Rayan,
Thanks for the offer of help. It looks like this is being triggered on exit, perhaps a H5Tclose being called from a destructor on a deleted/incorrect object. Any idea where this might happen in GATB?
Thanks a lot, John
Looks like I am getting this on clean up of the graph pointer:
#0 H5Fclose (file_id=16777216)
at /lustre/scratch118/infgen/team81/jl11/pdoc/unitigs/unitig-counter/gatb-core/gatb-core/thirdparty/hdf5/src/H5F.c:781
at /lustre/scratch118/infgen/team81/jl11/pdoc/unitigs/unitig-counter/gatb-core/gatb-core/thirdparty/hdf5/src/H5[20/9745]
#1 0x0000000000484a3b in gatb::core::tools::storage::impl::StorageHDF5Factory::StorageHDF5::~StorageHDF5 (this=0x112cf70, _
_in_chrg=<optimised out>,
__vtt_parm=<optimised out>)
at /lustre/scratch118/infgen/team81/jl11/pdoc/unitigs/unitig-counter/gatb-core/gatb-core/src/gatb/tools/storage/impl/StorageHDF5.hpp:222
#2 0x0000000000484ac9 in gatb::core::tools::storage::impl::StorageHDF5Factory::StorageHDF5::~StorageHDF5 (this=0x112cf70, __in_chrg=<optimised out>,
__vtt_parm=<optimised out>)
at /lustre/scratch118/infgen/team81/jl11/pdoc/unitigs/unitig-counter/gatb-core/gatb-core/src/gatb/tools/storage/impl/StorageHDF5.hpp:223
#3 0x0000000000423dac in gatb::core::system::SmartPointer::forget (
this=0x112cf70)
at /lustre/scratch118/infgen/team81/jl11/pdoc/unitigs/unitig-counter/gatb-core/gatb-core/src/gatb/system/api/ISmartPointer.hpp:113
#4 0x0000000000490110 in gatb::core::debruijn::impl::GraphTemplate<gatb::core::debruijn::impl::Node_t<gatb::core::tools::math::IntegerTemplate<boost::mpl::vector4<mpl_::int_<32>, mpl_::int_<64>, mpl_::int_<96>, mpl_::int_<128> > > >, gatb::core::debruijn::impl::Edge_t<gatb::core::debruijn::impl::Node_t<gatb::core::tools::math::IntegerTemplate<boost::mpl::vector4<mpl_::int_<32>, mpl_::int_<64>, mpl_::int_<96>, mpl_::int_<128> > > > >, boost::variant<boost::detail::variant::over_sequence<boost::mpl::l_item<mpl_::long_<4l>, gatb::core::debruijn::impl::GraphData<32ul>, boost::mpl::l_item<mpl_::long_<3l>, gatb::core::debruijn::impl::GraphData<64ul>, boost::mpl::l_item<mpl_::long_<2l>, gatb::core::debruijn::impl::GraphData<96ul>, boost::mpl::l_item<mpl_::long_<1l>, gatb::core::debruijn::impl::GraphData<128ul>, boost::mpl::l_end> > > > >> >::setStorage(gatb::core::tools::storage::impl::Storage*) (this=0x110aa80 <graph>, storage=0x0)
at /lustre/scratch118/infgen/team81/jl11/pdoc/unitigs/unitig-counter/gatb-core/gatb-core/src/gatb/debruijn/impl/Graph.hpp:1038
#5 0x000000000048a013 in gatb::core::debruijn::impl::GraphTemplate<gatb::core::debruijn::impl::Node_t<gatb::core::tools::math::IntegerTemplate<boost::mpl::vector4<mpl_::int_<32>, mpl_::int_<64>, mpl_::int_<96>, mpl_::int_<128> > > >, gatb::core::debruijn::impl::Edge_t<gatb::core::debruijn::impl::Node_t<gatb::core::tools::math::IntegerTemplate<boost::mpl::vector4<mpl_::int_<32>, mpl_::int_<64>, mpl_::int_<96>, mpl_::int_<128> > > > >, boost::variant<boost::detail::variant::over_sequence<boost::mpl::l_item<mpl_::long_<4l>, gatb::core::debruijn::impl::GraphData<32ul>, boost::mpl::l_item<mpl_::long_<3l>, gatb::core::debruijn::impl::GraphData<64ul>, boost::mpl::l_item<mpl_::long_<2l>, gatb::core::debruijn::impl::GraphData<96ul>, boost::mpl::l_item<mpl_::long_<1l>, gatb::core::debruijn::impl::GraphData<128ul>, boost::mpl::l_end> > > > >> >::~GraphTemplate() (
this=0x110aa80 <graph>, __in_chrg=<optimised out>)
at /lustre/scratch118/infgen/team81/jl11/pdoc/unitigs/unitig-counter/gatb-core/gatb-core/src/gatb/debruijn/impl/Graph.cpp:992
#6 0x00002aaaab849511 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#7 0x00002aaaab849595 in exit () from /lib/x86_64-linux-gnu/libc.so.6
#8 0x00002aaaab82f7f4 in __libc_start_main ()
from /lib/x86_64-linux-gnu/libc.so.6
#9 0x000000000041f111 in _start ()
In a previous version I was calling the destructor manually which didn't cause this issue, but led to a segfault at the end when the destructor was called again. I guess this a problem with the scope of the pointer? As you can probably tell my C++ is not the best, so any advice is welcome!
graph
is a global variable initialised with:
graph = gatb::core::debruijn::impl::Graph::create("-in %s -kmer-size %d -abundance-min 0 -out %s/graph -nb-cores %d",
readsFile.c_str(), kmerSize, outputFolder.c_str(), nbCores);
Would it be better to have this as a return property from the tool definition? Could you advise me on how best to do that?
hello,
This indeed seems to be a problem with destructor of the Graph class when the scope of the object is global. I've been able to reproduce this with this MWE, if it helps:
#include <gatb/gatb_core.hpp>
Graph graph;
int main (int argc, char* argv[])
{
IBank* bank = new BankStrings ("ATCGTACGACGCTAGCTAGCA",NULL);
graph = Graph::create (bank, "-kmer-size 5 -abundance-min 1 -out mygraph");
return 0;
}
Errors seems to be the same:
HDF5-DIAG: Error detected in HDF5 (1.8.18) thread 0:
#000: /Users/ishi/Desktop/git-apps/gatb-core/gatb-core/thirdparty/hdf5/src/H5F.c line 781 in H5Fclose(): invalid file identifier
major: Invalid arguments to routine
minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.8.18) thread 0:
#000: /Users/ishi/Desktop/git-apps/gatb-core/gatb-core/thirdparty/hdf5/src/H5G.c line 812 in H5Gclose(): not a group
major: Invalid arguments to routine
minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.8.18) thread 0:
#000: /Users/ishi/Desktop/git-apps/gatb-core/gatb-core/thirdparty/hdf5/src/H5G.c line 812 in H5Gclose(): not a group
major: Invalid arguments to routine
minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.8.18) thread 0:
#000: /Users/ishi/Desktop/git-apps/gatb-core/gatb-core/thirdparty/hdf5/src/H5T.c line 1716 in H5Tclose(): not a datatype
major: Invalid arguments to routine
minor: Inappropriate type
HDF5 error (H5Tclose), status -1
I think the best solution for this though would be to not leave Graph graph
as a global variable.
I have my fault on this, since this code is adapted from https://gitlab.com/leoisl/dbgwas , and in there the graph object is a global variable. Global variables are usually bad practices, and I don't see a reason to keep the graph as a global variable here (really sorry about coding like this, I indeed planned a refactoring of DBGWAS, but never got time...).
I had the same issues in DBGWAS, and I worked around it by declaring a constructor (in fact, a static method that creates a graph, like the other creators in GATB) in the Graph class that returned a pointer to a Graph. As such, the global variable would be a pointer, instead of an object, and I could control when the destructor was called by deleting the pointer. IIRC, if the graph is destroyed before main exits, then everything is fine. If it is destroyed after main (like the destructor of a global object), then we have this bug.
Very sorry that I forgot to fill an issue about this after verifying that the workaround worked...
Ah, I ran into this when changing the code too, I didn't release the returnAsPointer method had been added!
So yes, sounds like it would be best to remove the global variable. Do you know how I can return the created Graph graph
from the Tool class (build_dbg in this case) so that it can be passed to the map_reads tool? I had a look in the setOutput()
method but this only seems to be for simple properties.
Do both of these need to be implemented as members of the Tool class, if we can't pass variables in and out of them except via globals?
Do you think I should:
Hi all,
Seems like a bug in Graph's destructor. Let me see if I can fix it.
Rayan
Hi @johnless, (cc: @leoisl, also cc @rizkg @edrezen @piezoid just for your curiosity)
Upon investigation, my understand of the error is that there are two Graph objects that co-exist, both are pointing to the same HDF5 file. Thus the Graph destructor is called twice on the same HDF5 file, which causes the error, because the second time the HDF5 is already closed.
Graph::create
, which is freed at the end of the current scope (that happens to be the end of the program). operator=()
method).GATB shouldn't allow two copies of the same graph to co-exist, yet it does. This is a bug. Until we find a way to neatly forbid that behavior (maybe by removing copy-assignment), I suggest the following simple fix for you: instead of a global variable Graph graph
, please do a pointer as follows:
#include "gatb/gatb_core.hpp"
Graph *graph;
int main (int argc, char* argv[])
{
graph = new Graph(gatb::core::debruijn::impl::Graph::create("-in test_graph.fa -kmer-size 21 -abundance-min 1 -out test_graph -nb-cores 2"));
// do stuff
delete graph;
return 0;
}
This is more or less what Leandro did with his createAsPointer
function in DBGWAS, but that code above doesn't require custom additions to the gatb-core code.
Then one might ask, why does Graph::create
sometimes does not create two Graph copies? I believe this is because it is called at the same time the variable is declared (Graph graph = Graph::create(..)
), and no temporary object gets created.
That wasn't easy to debug..
Rayan
note to self, the above example code can be compiled from gatb-core as follows:
mkdir build && cd build
cmake .. -DCMAKE_INSTALL_PREFIX=$PWD
make -j8 install
g++ -std=c++0x -O3 -L./lib/ -Iinclude/ -Ithirdparty ./test.cpp -o test -lgatbcore -lpthread -lz -lhdf5 -ldl
Hello everyone,
@johnlees : really sorry for the delay on responding to https://github.com/johnlees/unitig-counter/issues/2#issuecomment-484993791 and https://github.com/johnlees/unitig-counter/issues/2#issuecomment-484995953 . This week was my PhD defence week, so I had to focus on it! Anyway, I think the solution Rayan posted would be the easiest one for you to implement. But combining both execute()
methods into a single one and getting rid of the global var is also fine. Your other suggestion is also fine. So, many options! Pick the one you think it is the easiest for you to implement!
@rchikhi : thanks for the quick debug of such cryptic bug! I am wondering if I can try to contribute to find an easy solution. I'd propose:
add move-assignment op to Graph
class (this in Graph.hpp
):
GraphTemplate& operator= (GraphTemplate&& graph) {
*this = graph;
graph._storage = 0;
graph._variant = 0;
return *this;
}
And change https://github.com/GATB/gatb-core/blob/master/gatb-core/src/gatb/debruijn/impl/Graph.cpp#L992 to:
if (_storage) setStorage (0);
Probably a move ctor would also be needed to cover all such cases.
I tested in a very very small world example, and seems to work, but more tests are surely needed.
Also, there might be better ways to solve this issue... there might be bugs also, etc...
Thanks again!!
Regards
Thanks both @rchikhi and @leoisl for all your help with this – I really appreciate it! (Also, I hope your PhD defence went well!)
The pointer solution suggested was easy enough, and is now working in my tests. I'm looking forward to more use of GATB in the future
Thanks for the quick commit john. @leoisl, thanks too for the move assignments, I agree that it could be a potentially nice way to handle it. But I won't rush into implementing it as it's low priority for now for me.
Hello, I've just installed the software via conda and am getting the same HDF5 error. Could you please advise how to get it working? Thanks! Jenny
Looks like I forgot to update the version on conda! I've got the fixed version pending upload now – I will let you know when it is ready to be updated
If you run:
conda update unitig-counter
Your install should now work
That works for me now, thank you!