antonmks / Alenka

GPU database engine
Other
1.17k stars 120 forks source link

Loading error #101

Open vasu-cherlopalle opened 8 years ago

vasu-cherlopalle commented 8 years ago

Hi Anton,

I am trying to load data with -SSD and -l 5000 options using K40 GPU.

Alenka works fine for few loads and after that it fails with following error. At this point NVIDIA-SMI will also not work (doesn't respond).

_### terminate called after throwing an instance of 'thrust::system::systemerror' what(): an illegal memory access was encountered /bin/sh: line 1: 33128 Aborted

After long time, NVIDIA-SMI responds back with following output. At this point, the load starts to work also. Looks like GPU is getting into a state where it cannot respond any more. Please let me know if there is any work around for this.

_+------------------------------------------------------+

| NVIDIA-SMI 352.93 Driver Version: 352.93 |
|-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================|

| 0 Tesla K40m Off | 0000:42:00.0 Off | 0 | | N/A 51C P0 63W / 235W | 227MiB / 11519MiB | 0% Default | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 37612 C /home/dmatic/dataMatic/alenka/bin/alenka 3MiB | | 0 37622 C /home/dmatic/dataMatic/alenka/bin/alenka 3MiB | | 0 37639 C /home/dmatic/dataMatic/alenka/bin/alenka 3MiB | | 0 37647 C /home/dmatic/dataMatic/alenka/bin/alenka 3MiB | | 0 37657 C /home/dmatic/dataMatic/alenka/bin/alenka 3MiB | | 0 37677 C /home/dmatic/dataMatic/alenka/bin/alenka 2MiB | | 0 37688 C /home/dmatic/dataMatic/alenka/bin/alenka 2MiB | | 0 37696 C /home/dmatic/dataMatic/alenka/bin/alenka 2MiB | | 0 37700 C /home/dmatic/dataMatic/alenka/bin/alenka 2MiB | | 0 37704 C /home/dmatic/dataMatic/alenka/bin/alenka 2MiB | | 0 37716 C /home/dmatic/dataMatic/alenka/bin/alenka 2MiB | | 0 37729 C /home/dmatic/dataMatic/alenka/bin/alenka 2MiB | | 0 37733 C /home/dmatic/dataMatic/alenka/bin/alenka 2MiB | | 0 37766 C /home/dmatic/dataMatic/alenka/bin/alenka 2MiB | | 0 37768 C /home/dmatic/dataMatic/alenka/bin/alenka 2MiB | | 0 37779 C /home/dmatic/dataMatic/alenka/bin/alenka 2MiB | | 0 37788 C /home/dmatic/dataMatic/alenka/bin/alenka 2MiB | | 0 37792 C /home/dmatic/dataMatic/alenka/bin/alenka 2MiB | | 0 37798 C /home/dmatic/dataMatic/alenka/bin/alenka 2MiB | | 0 37804 C /home/dmatic/dataMatic/alenka/bin/alenka 2MiB | | 0 37809 C /home/dmatic/dataMatic/alenka/bin/alenka 2MiB | | 0 37815 C /home/dmatic/dataMatic/alenka/bin/alenka 2MiB | | 0 37822 C /home/dmatic/dataMatic/alenka/bin/alenka 2MiB | | 0 37829 C /home/dmatic/dataMatic/alenka/bin/alenka 2MiB | | 0 37833 C /home/dmatic/dataMatic/alenka/bin/alenka 2MiB | | 0 37840 C /home/dmatic/dataMatic/alenka/bin/alenka 2MiB | | 0 37844 C /home/dmatic/dataMatic/alenka/bin/alenka 2MiB | | 0 37846 C /home/dmatic/dataMatic/alenka/bin/alenka 2MiB | | 0 37848 C /home/dmatic/dataMatic/alenka/bin/alenka 2MiB | | 0 37855 C /home/dmatic/dataMatic/alenka/bin/alenka 2MiB | | 0 37860 C /home/dmatic/dataMatic/alenka/bin/alenka 2MiB | | 0 37870 C /home/dmatic/dataMatic/alenka/bin/alenka 2MiB | | 0 37872 C /home/dmatic/dataMatic/alenka/bin/alenka 2MiB | | 0 37878 C /home/dmatic/dataMatic/alenka/bin/alenka 2MiB | | 0 37882 C /home/dmatic/dataMatic/alenka/bin/alenka 2MiB | | 0 37887 C /home/dmatic/dataMatic/alenka/bin/alenka 2MiB | | 0 37893 C /home/dmatic/dataMatic/alenka/bin/alenka 2MiB | | 0 37897 C /home/dmatic/dataMatic/alenka/bin/alenka 2MiB | | 0 37901 C /home/dmatic/dataMatic/alenka/bin/alenka 2MiB | | 0 37905 C /home/dmatic/dataMatic/alenka/bin/alenka 2MiB | | 0 37912 C /home/dmatic/dataMatic/alenka/bin/alenka 2MiB | | 0 37920 C /home/dmatic/dataMatic/alenka/bin/alenka 2MiB | | 0 37925 C /home/dmatic/dataMatic/alenka/bin/alenka 2MiB | | 0 37930 C /home/dmatic/dataMatic/alenka/bin/alenka 2MiB | +-----------------------------------------------------------------------------+_

antonmks commented 8 years ago

See if you can monitor memory while loading the data. It is possible that there is a memory leak somewhere. Can you check both host and gpu memory ?

marklit commented 8 years ago

There was a commit pushed up today (302db7083e9046bb8769c8c0da0443fe23a73900) that address an issue with imports. It might be worth recompiling Alenka and trying the import again. I've been looking at similar issue and this patch fixed my issue.