bacpop / unitig-counter

Uses cDBG to count unitigs in bacterial populations
GNU Affero General Public License v3.0
13 stars 2 forks source link

HDF5 error just before exit #2

Closed johnlees closed 5 years ago

johnlees commented 5 years ago
[FATAL ERROR] Error opening file unitigs/unitigs.unique_rows.txt
HDF5-DIAG: Error detected in HDF5 (1.8.18) thread 0:
  #000: /opt/conda/conda-bld/unitig-counter_1553808433058/work/gatb-core/gatb-core/thirdparty/hdf5/src/H5F.c line 781 in H5Fclose(): invalid file identifier                                                      
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.8.18) thread 0:
  #000: /opt/conda/conda-bld/unitig-counter_1553808433058/work/gatb-core/gatb-core/thirdparty/hdf5/src/H5G.c line 812 in H5Gclose(): not a group                                                                  
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.8.18) thread 0:
  #000: /opt/conda/conda-bld/unitig-counter_1553808433058/work/gatb-core/gatb-core/thirdparty/hdf5/src/H5G.c line 812 in H5Gclose(): not a group                                                                  
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.8.18) thread 0:
  #000: /opt/conda/conda-bld/unitig-counter_1553808433058/work/gatb-core/gatb-core/thirdparty/hdf5/src/H5T.c line 1716 in H5Tclose(): not a datatype                                                              
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5 error (H5Tclose), status -1
rchikhi commented 5 years ago

Hi John,

I'm a developer of GATB, and a colleague of mine reported this bug in pySeer to me. Seems that you're trying to open a .txt file as a HDF5 file, please let me know if you'd like help in debugging it.

Best, Rayan

johnlees commented 5 years ago

Hi Rayan,

Thanks for the offer of help. It looks like this is being triggered on exit, perhaps a H5Tclose being called from a destructor on a deleted/incorrect object. Any idea where this might happen in GATB?

Thanks a lot, John

johnlees commented 5 years ago

Looks like I am getting this on clean up of the graph pointer:

#0  H5Fclose (file_id=16777216)
    at /lustre/scratch118/infgen/team81/jl11/pdoc/unitigs/unitig-counter/gatb-core/gatb-core/thirdparty/hdf5/src/H5F.c:781
    at /lustre/scratch118/infgen/team81/jl11/pdoc/unitigs/unitig-counter/gatb-core/gatb-core/thirdparty/hdf5/src/H5[20/9745]
#1  0x0000000000484a3b in gatb::core::tools::storage::impl::StorageHDF5Factory::StorageHDF5::~StorageHDF5 (this=0x112cf70, _
_in_chrg=<optimised out>,
    __vtt_parm=<optimised out>)
    at /lustre/scratch118/infgen/team81/jl11/pdoc/unitigs/unitig-counter/gatb-core/gatb-core/src/gatb/tools/storage/impl/StorageHDF5.hpp:222
#2  0x0000000000484ac9 in gatb::core::tools::storage::impl::StorageHDF5Factory::StorageHDF5::~StorageHDF5 (this=0x112cf70, __in_chrg=<optimised out>,
    __vtt_parm=<optimised out>)
    at /lustre/scratch118/infgen/team81/jl11/pdoc/unitigs/unitig-counter/gatb-core/gatb-core/src/gatb/tools/storage/impl/StorageHDF5.hpp:223
#3  0x0000000000423dac in gatb::core::system::SmartPointer::forget (
    this=0x112cf70)
    at /lustre/scratch118/infgen/team81/jl11/pdoc/unitigs/unitig-counter/gatb-core/gatb-core/src/gatb/system/api/ISmartPointer.hpp:113
#4  0x0000000000490110 in gatb::core::debruijn::impl::GraphTemplate<gatb::core::debruijn::impl::Node_t<gatb::core::tools::math::IntegerTemplate<boost::mpl::vector4<mpl_::int_<32>, mpl_::int_<64>, mpl_::int_<96>, mpl_::int_<128> > > >, gatb::core::debruijn::impl::Edge_t<gatb::core::debruijn::impl::Node_t<gatb::core::tools::math::IntegerTemplate<boost::mpl::vector4<mpl_::int_<32>, mpl_::int_<64>, mpl_::int_<96>, mpl_::int_<128> > > > >, boost::variant<boost::detail::variant::over_sequence<boost::mpl::l_item<mpl_::long_<4l>, gatb::core::debruijn::impl::GraphData<32ul>, boost::mpl::l_item<mpl_::long_<3l>, gatb::core::debruijn::impl::GraphData<64ul>, boost::mpl::l_item<mpl_::long_<2l>, gatb::core::debruijn::impl::GraphData<96ul>, boost::mpl::l_item<mpl_::long_<1l>, gatb::core::debruijn::impl::GraphData<128ul>, boost::mpl::l_end> > > > >> >::setStorage(gatb::core::tools::storage::impl::Storage*) (this=0x110aa80 <graph>, storage=0x0)
    at /lustre/scratch118/infgen/team81/jl11/pdoc/unitigs/unitig-counter/gatb-core/gatb-core/src/gatb/debruijn/impl/Graph.hpp:1038
#5  0x000000000048a013 in gatb::core::debruijn::impl::GraphTemplate<gatb::core::debruijn::impl::Node_t<gatb::core::tools::math::IntegerTemplate<boost::mpl::vector4<mpl_::int_<32>, mpl_::int_<64>, mpl_::int_<96>, mpl_::int_<128> > > >, gatb::core::debruijn::impl::Edge_t<gatb::core::debruijn::impl::Node_t<gatb::core::tools::math::IntegerTemplate<boost::mpl::vector4<mpl_::int_<32>, mpl_::int_<64>, mpl_::int_<96>, mpl_::int_<128> > > > >, boost::variant<boost::detail::variant::over_sequence<boost::mpl::l_item<mpl_::long_<4l>, gatb::core::debruijn::impl::GraphData<32ul>, boost::mpl::l_item<mpl_::long_<3l>, gatb::core::debruijn::impl::GraphData<64ul>, boost::mpl::l_item<mpl_::long_<2l>, gatb::core::debruijn::impl::GraphData<96ul>, boost::mpl::l_item<mpl_::long_<1l>, gatb::core::debruijn::impl::GraphData<128ul>, boost::mpl::l_end> > > > >> >::~GraphTemplate() (
    this=0x110aa80 <graph>, __in_chrg=<optimised out>)
    at /lustre/scratch118/infgen/team81/jl11/pdoc/unitigs/unitig-counter/gatb-core/gatb-core/src/gatb/debruijn/impl/Graph.cpp:992
#6  0x00002aaaab849511 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#7  0x00002aaaab849595 in exit () from /lib/x86_64-linux-gnu/libc.so.6
#8  0x00002aaaab82f7f4 in __libc_start_main ()
   from /lib/x86_64-linux-gnu/libc.so.6
#9  0x000000000041f111 in _start ()

In a previous version I was calling the destructor manually which didn't cause this issue, but led to a segfault at the end when the destructor was called again. I guess this a problem with the scope of the pointer? As you can probably tell my C++ is not the best, so any advice is welcome!

johnlees commented 5 years ago

graph is a global variable initialised with:

    graph = gatb::core::debruijn::impl::Graph::create("-in %s -kmer-size %d -abundance-min 0 -out %s/graph -nb-cores %d",
                                                          readsFile.c_str(), kmerSize, outputFolder.c_str(), nbCores);

Would it be better to have this as a return property from the tool definition? Could you advise me on how best to do that?

leoisl commented 5 years ago

hello,

This indeed seems to be a problem with destructor of the Graph class when the scope of the object is global. I've been able to reproduce this with this MWE, if it helps:

#include <gatb/gatb_core.hpp>
Graph graph;
int main (int argc, char* argv[])
{
    IBank* bank = new BankStrings ("ATCGTACGACGCTAGCTAGCA",NULL);
    graph = Graph::create (bank, "-kmer-size 5  -abundance-min 1  -out mygraph");
    return 0;
}

Errors seems to be the same:

HDF5-DIAG: Error detected in HDF5 (1.8.18) thread 0:
  #000: /Users/ishi/Desktop/git-apps/gatb-core/gatb-core/thirdparty/hdf5/src/H5F.c line 781 in H5Fclose(): invalid file identifier
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.8.18) thread 0:
  #000: /Users/ishi/Desktop/git-apps/gatb-core/gatb-core/thirdparty/hdf5/src/H5G.c line 812 in H5Gclose(): not a group
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.8.18) thread 0:
  #000: /Users/ishi/Desktop/git-apps/gatb-core/gatb-core/thirdparty/hdf5/src/H5G.c line 812 in H5Gclose(): not a group
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.8.18) thread 0:
  #000: /Users/ishi/Desktop/git-apps/gatb-core/gatb-core/thirdparty/hdf5/src/H5T.c line 1716 in H5Tclose(): not a datatype
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5 error (H5Tclose), status -1
leoisl commented 5 years ago

I think the best solution for this though would be to not leave Graph graph as a global variable.

I have my fault on this, since this code is adapted from https://gitlab.com/leoisl/dbgwas , and in there the graph object is a global variable. Global variables are usually bad practices, and I don't see a reason to keep the graph as a global variable here (really sorry about coding like this, I indeed planned a refactoring of DBGWAS, but never got time...).

I had the same issues in DBGWAS, and I worked around it by declaring a constructor (in fact, a static method that creates a graph, like the other creators in GATB) in the Graph class that returned a pointer to a Graph. As such, the global variable would be a pointer, instead of an object, and I could control when the destructor was called by deleting the pointer. IIRC, if the graph is destroyed before main exits, then everything is fine. If it is destroyed after main (like the destructor of a global object), then we have this bug.

Very sorry that I forgot to fill an issue about this after verifying that the workaround worked...

johnlees commented 5 years ago

Ah, I ran into this when changing the code too, I didn't release the returnAsPointer method had been added!

So yes, sounds like it would be best to remove the global variable. Do you know how I can return the created Graph graph from the Tool class (build_dbg in this case) so that it can be passed to the map_reads tool? I had a look in the setOutput() method but this only seems to be for simple properties.

johnlees commented 5 years ago

Do both of these need to be implemented as members of the Tool class, if we can't pass variables in and out of them except via globals?

Do you think I should:

rchikhi commented 5 years ago

Hi all,

Seems like a bug in Graph's destructor. Let me see if I can fix it.

Rayan

rchikhi commented 5 years ago

Hi @johnless, (cc: @leoisl, also cc @rizkg @edrezen @piezoid just for your curiosity)

Upon investigation, my understand of the error is that there are two Graph objects that co-exist, both are pointing to the same HDF5 file. Thus the Graph destructor is called twice on the same HDF5 file, which causes the error, because the second time the HDF5 is already closed.

GATB shouldn't allow two copies of the same graph to co-exist, yet it does. This is a bug. Until we find a way to neatly forbid that behavior (maybe by removing copy-assignment), I suggest the following simple fix for you: instead of a global variable Graph graph, please do a pointer as follows:

#include "gatb/gatb_core.hpp"
Graph *graph;
int main (int argc, char* argv[])
{
    graph = new Graph(gatb::core::debruijn::impl::Graph::create("-in test_graph.fa -kmer-size 21 -abundance-min 1 -out test_graph -nb-cores 2"));

    // do stuff

    delete graph;
    return 0;
}

This is more or less what Leandro did with his createAsPointer function in DBGWAS, but that code above doesn't require custom additions to the gatb-core code.

Then one might ask, why does Graph::create sometimes does not create two Graph copies? I believe this is because it is called at the same time the variable is declared (Graph graph = Graph::create(..)), and no temporary object gets created.

That wasn't easy to debug..

Rayan


note to self, the above example code can be compiled from gatb-core as follows:

mkdir build && cd build 
cmake .. -DCMAKE_INSTALL_PREFIX=$PWD
make -j8 install 
g++   -std=c++0x -O3 -L./lib/ -Iinclude/ -Ithirdparty ./test.cpp -o test -lgatbcore  -lpthread -lz -lhdf5 -ldl
leoisl commented 5 years ago

Hello everyone,

@johnlees : really sorry for the delay on responding to https://github.com/johnlees/unitig-counter/issues/2#issuecomment-484993791 and https://github.com/johnlees/unitig-counter/issues/2#issuecomment-484995953 . This week was my PhD defence week, so I had to focus on it! Anyway, I think the solution Rayan posted would be the easiest one for you to implement. But combining both execute() methods into a single one and getting rid of the global var is also fine. Your other suggestion is also fine. So, many options! Pick the one you think it is the easiest for you to implement!

@rchikhi : thanks for the quick debug of such cryptic bug! I am wondering if I can try to contribute to find an easy solution. I'd propose:

add move-assignment op to Graph class (this in Graph.hpp):

    GraphTemplate& operator= (GraphTemplate&& graph) {
        *this = graph;
        graph._storage = 0;
        graph._variant = 0;
        return *this;
    }

And change https://github.com/GATB/gatb-core/blob/master/gatb-core/src/gatb/debruijn/impl/Graph.cpp#L992 to: if (_storage) setStorage (0);

Probably a move ctor would also be needed to cover all such cases.

I tested in a very very small world example, and seems to work, but more tests are surely needed.

Also, there might be better ways to solve this issue... there might be bugs also, etc...

Thanks again!!

Regards

johnlees commented 5 years ago

Thanks both @rchikhi and @leoisl for all your help with this – I really appreciate it! (Also, I hope your PhD defence went well!)

The pointer solution suggested was easy enough, and is now working in my tests. I'm looking forward to more use of GATB in the future

rchikhi commented 5 years ago

Thanks for the quick commit john. @leoisl, thanks too for the move assignments, I agree that it could be a potentially nice way to handle it. But I won't rush into implementing it as it's low priority for now for me.

jmattock commented 5 years ago

Hello, I've just installed the software via conda and am getting the same HDF5 error. Could you please advise how to get it working? Thanks! Jenny

johnlees commented 5 years ago

Looks like I forgot to update the version on conda! I've got the fixed version pending upload now – I will let you know when it is ready to be updated

johnlees commented 5 years ago

If you run:

conda update unitig-counter

Your install should now work

jmattock commented 5 years ago

That works for me now, thank you!