drolbr / Overpass-API

A database engine to query the OpenStreetMap data.
http://overpass-api.de
GNU Affero General Public License v3.0
718 stars 90 forks source link

init_osm3s.sh should check for disk space available #252

Open srcspider opened 8 years ago

srcspider commented 8 years ago

I had the unfortunate experience while trying to install on a VM of running out of disk space. This issue was primarily caused by the VM in question receiving the requested space but actually not formatting it on initialization (ending up with it only having 10gigabytes). Probably some optimization on it's initialization I wasn't aware of at the time.

Upon reaching the limit the init script will not report that it has run out of disk space but instead hit either a memory corruption error (which when I saw it I assumed it was a ram error not a disk error) or some form of "Random_File" (the actual name I believe) something or other failure while trying to create a idx file. In hindsight the word "File" should have been a good hint there but there was so much other gobbly goop in the error message that just didnt register as a signal.

I assume it would fail at random points in time based what the disk size is and potentially for varying causes. So the errors I received are probably just a sample and you can get just about anything.

It would help to have a simple check. Maybe the script could just make a generous (in it's favor) estimate on the piece it's trying to flush to disk and check if the disk has enough space. And if it doesn't report it back clearly as a insufficient disk space error maybe with what it's seeing as being free space available. Would save a lot of time debugging. It would be ideal if it could check before even initializing but I assume it's not easy to estimate the final disk size.

mmd-osm commented 8 years ago

If you're able to reproduce the exact error message, I would suggest to add that to the Installation troubleshooting section in the documentation. That way others might be able to quickly identify the root cause of such an issue simply via web search.

Determining available space for update_database (that's the process which is triggered by init_osm3s.sh and doing all the work + throwing the error messages you've seen) is anything but trivial, as it could depend on file system characteristics - think of compression as one example. Honestly, I don't think it is really worth the effort.

BTW: if you're short on disk space and just want to load some small extract into your db, there's some additional compression options available. Unfortunately that hasn't been merged into the master branch yet... check out the Overpass Dev Mailing list archive for details.

drolbr commented 8 years ago

Well, if we can do some disk space estimation from the bash script then we could issue a warning when disk space is less than four times the input file size or so. This won't fix all corner cases, but give the hint that would have saved srcspider a lot of time.

mmd-osm commented 7 years ago

I ran a few test cases simulating a disk full situation. After all, I would say there's some real need to fix a few bugs in the coding.


Current situation:

Reading XML file ... elapsed node 361497565. Flushing to database ..terminate called after throwing an instance of 'File_Error'
Abgebrochen (Speicherabzug geschrieben)
(gdb) bt
#0  0x00007fb4b9a6e428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007fb4b9a7002a in __GI_abort () at abort.c:89
#2  0x00007fb4ba0a784d in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007fb4ba0a56b6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007fb4ba0a46a9 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007fb4ba0a5005 in __gxx_personality_v0 () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007fb4b9e11f83 in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#7  0x00007fb4b9e12487 in _Unwind_Resume () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#8  0x00000000004404a2 in Random_File<Uint64, Uint31_Index>::~Random_File() ()
#9  0x00000000004407b3 in void update_map_positions<Uint64>(std::vector<std::pair<Uint64, Uint31_Index>, std::allocator<std::pair<Uint64, Uint31_Index> > >, Transaction&, File_Properties const&) ()
#10 0x000000000042308b in Node_Updater::update(Osm_Backend_Callback*, bool) ()
#11 0x00000000004f7e0b in (anonymous namespace)::node_end() ()
#12 0x00000000004fa34d in end(char const*) ()
#13 0x00007fb4ba3a23cb in ?? () from /lib/x86_64-linux-gnu/libexpat.so.1
#14 0x00007fb4ba3a338c in ?? () from /lib/x86_64-linux-gnu/libexpat.so.1
#15 0x00007fb4ba3a770b in XML_ParseBuffer () from /lib/x86_64-linux-gnu/libexpat.so.1
#16 0x000000000051c84e in parse(_IO_FILE*, void (*)(char const*, char const**), void (*)(char const*), void (*)(void*, char const*, int)) ()
#17 0x00000000004fab09 in Osm_Updater::parse_file_completely(_IO_FILE*) ()
#18 0x000000000040494d in main ()

move_cache_window throws an exception inside Random_File destructor, which is not permitted.

template< typename Key, typename Value >
Random_File< Key, Value >::~Random_File()
{
  move_cache_window(index->npos);
  //delete index;
}

TODO:


Another issue is that the "disk full" situation is not properly handled in Raw_File::write.

Quoting from the man page:

It is not an error if this number is smaller than the number of bytes requested; this may happen for example because the disk device was filled. See also NOTES. On error, -1 is returned, and errno is set appropriately.

In strace, we can clearly see that the result is not -1, but the actual number of bytes written, in this example 8192 bytes instead of 196608.

write(3, "l\237\2\0\363\27w\361\7\0\234\1\0\0\340&\33b\347\271\35\26\0\0\0\0\4\0\0\0\323\350"..., 196608) = 8192 close(3)

errno does not have a defined value here. It may only be used if foo is -1:

inline void Raw_File::write(uint8* buf, uint64 size, const std::string& caller_id) const
{
  uint64 foo = ::write(fd_, buf, size);
  if (foo != size)
    throw File_Error(errno, name, caller_id);
}

For this reason, the user sees the following incorrect error message: "2 No such file or directory"

Reading XML file ... elapsed node 361497565. Flushing to database ...... done.
Reading XML file ... elapsed node 727685024. Flushing to database ....File error caught: 2 No such file or directory /dummy/nodes_meta.bin File_Blocks::insert_block::2

TODOs: