No error shows when local asterix instance loads data with its disk space used up

GoogleCodeExporter commented 8 years ago

Here is more information: I am now running the same experiment on my desktop. I 
am loading a dataset with size 22GB.

=========================>
[jarodwen@jainux data]$ ls -Alh 
/home/jarodwen/Datasets/AggBenchs/v20121127/z0_1000000000_1000000000_sorted.dat.
shuffled
-rw-r--r--. 1 jarodwen jarodwen 22G Jan 14 11:17 
/home/jarodwen/Datasets/AggBenchs/v20121127/z0_1000000000_1000000000_sorted.dat.
shuffled
<=========================

Here is the aql:

=========================>
drop dataverse AggBench if exists;
create dataverse AggBench;
use dataverse AggBench;

create type UserVisitType as closed {
    ip: string,
    revenue: double
}

create dataset UserVisit(UserVisitType)
primary key ip;

load dataset UserVisit using localfs
(("path"="127.0.0.1:///home/jarodwen/Datasets/AggBenchs/v20121127/z0_1000000000_
1000000000_sorted.dat.shuffled"), ("format"="delimited-text"), 
("delimiter"="|"));
<=========================

The asterix instance is described as:

=========================>
[jarodwen@jainux data]$ managix describe -n local -admin
INFO: Name:local
Created:Fri Jun 14 15:22:46 PDT 2013
Web-Url:http://127.0.0.1:19001
State:ACTIVE
Master node:master:127.0.0.1
node1:127.0.0.1

Asterix version:0.8.1-SNAPSHOT
Metadata Node:node1
Processes
NC at node1 [ 18372 ]
CC at master [ 18272 ]

Asterix Configuration
nc.java.opts                             :-Xmx1024m
cc.java.opts                             :-Xmx1024m
storage.buffercache.pagesize             :32768
storage.buffercache.size                 :33554432
storage.buffercache.maxopenfiles         :214748364
storage.memorycomponent.pagesize         :32768
storage.memorycomponent.numpages         :1024
storage.memorycomponent.globalbudget     :536870192
storage.lsm.mergethreshold               :3
storage.lsm.bloomfilter.falsepositiverate:0.01
txn.log.buffer.numpages                  :8
txn.log.buffer.pagesize                  :131072
txn.log.partitionsize                    :2147483648
txn.log.disksectorsize                   :4096
txn.log.groupcommitinterval              :1
txn.log.checkpoint.lsnthreshold          :67108864
txn.log.checkpoint.pollfrequency         :120
txn.log.checkpoint.history               :0
txn.lock.escalationthreshold             :1000
txn.lock.shrinktimer                     :5000
txn.lock.timeout.waitthreshold           :60000
txn.lock.timeout.sweepthreshold          :10000
compiler.sortmemory                      :33554432
compiler.joinmemory                      :33554432
compiler.framesize                       :32768
web.port                                 :19001
api.port                                 :19002
log.level                                :INFO
<=========================

And I started this morning 7:25am, at which the disk space is:

=========================>
[jarodwen@jainux hyracks]$ df -h
df: `/run/user/preston/gvfs': Permission denied
df: `/run/media/preston/DOSBLUE': Permission denied
Filesystem                     Size  Used Avail Use% Mounted on
devtmpfs                       3.9G     0  3.9G   0% /dev
tmpfs                          3.9G  296K  3.9G   1% /dev/shm
tmpfs                          3.9G   35M  3.9G   1% /run
/dev/mapper/vg_jainux-lv_root   50G   48G  1.3G  98% /
tmpfs                          3.9G     0  3.9G   0% /sys/fs/cgroup
tmpfs                          3.9G     0  3.9G   0% /media
/dev/sda5                      477M   85M  367M  19% /boot
/dev/mapper/vg_jainux-lv_home  394G  355G   20G  95% /home
<=========================

For my configuration, the only io device is mounted on /home. Now it is 8:20am, 
and the disk utilization is:

=========================>
[jarodwen@jainux hyracks]$ df -h
df: `/run/user/preston/gvfs': Permission denied
df: `/run/media/preston/DOSBLUE': Permission denied
Filesystem                     Size  Used Avail Use% Mounted on
devtmpfs                       3.9G     0  3.9G   0% /dev
tmpfs                          3.9G  296K  3.9G   1% /dev/shm
tmpfs                          3.9G   35M  3.9G   1% /run
/dev/mapper/vg_jainux-lv_root   50G   48G  1.3G  98% /
tmpfs                          3.9G     0  3.9G   0% /sys/fs/cgroup
tmpfs                          3.9G     0  3.9G   0% /media
/dev/sda5                      477M   85M  367M  19% /boot
/dev/mapper/vg_jainux-lv_home  394G  374G  2.2M 100% /home
<=========================

Note that all space for /home has been used (it has been more than 20 mins 
since it was used up). However the loading is still on without any error. I 
attached the cc and nc logs with this email. You will find that there is no 
error so far.

@sattam feel free to assign to anyone who would be familiar with this.

Original issue reported on code.google.com by jarod...@gmail.com on 19 Jun 2013 at 6:44

Attachments:

GoogleCodeExporter commented 8 years ago

I was able to reproduce this issue. But it is not related to the bulkload. It 
seems the issue is happening even before that. Specifically in the external 
sort operator.

When I pass the pre-sorted hint in the load statement, the system will return a 
proper message saying:
No space left on device [IOException]

But when omitting the hint, the system is hung and by looking at the log it 
seems the last thing there is the external sort operator:

INFO: Initializing TAID:TID:ANID:ODID:3:0:0:0 -> 
[edu.uci.ics.hyracks.dataflow.std.sort.ExternalSortOperatorDescriptor$SortActivi
ty@39fc58f3]
Jun 22, 2013 11:40:40 AM edu.uci.ics.hyracks.control.nc.work.StartTasksWork run
INFO: input: 0: CDID:1
Jun 22, 2013 11:40:40 AM 
edu.uci.ics.hyracks.control.common.work.WorkQueue$WorkerThread run
INFO: Executing: 
edu.uci.ics.hyracks.control.nc.work.ReportPartitionAvailabilityWork@5092eefe
Jun 22, 2013 11:40:48 AM 
edu.uci.ics.hyracks.control.common.work.WorkQueue$WorkerThread run
INFO: Executing: 
edu.uci.ics.hyracks.control.nc.work.NotifyTaskFailureWork@1991d886
Jun 22, 2013 11:40:48 AM 
edu.uci.ics.hyracks.control.common.work.WorkQueue$WorkerThread run
INFO: Executing: edu.uci.ics.hyracks.control.nc.work.AbortTasksWork@15aff300
Jun 22, 2013 11:40:48 AM edu.uci.ics.hyracks.control.nc.work.AbortTasksWork run
INFO: Aborting Tasks: JID:1:[TAID:TID:ANID:ODID:0:0:0:0]
Jun 22, 2013 11:40:48 AM 
edu.uci.ics.hyracks.control.common.work.WorkQueue$WorkerThread run
INFO: Executing: edu.uci.ics.hyracks.control.nc.work.CleanupJobletWork@6c8c0d86
Jun 22, 2013 11:40:48 AM edu.uci.ics.hyracks.control.nc.work.CleanupJobletWork 
run
INFO: Cleaning up after job: JID:1
Jun 22, 2013 11:41:15 AM 
edu.uci.ics.hyracks.control.common.dataset.ResultStateSweeper sweep
INFO: Result state cleanup instance successfully compl

Re-assigning this to Pouria.

Original comment by salsuba...@gmail.com on 22 Jun 2013 at 6:51

GoogleCodeExporter commented 8 years ago

On a related or unrelated note (not sure which) - do we make sure to not barf 
in some horrible way if someone passes us unsorted data but a hint saying it's 
sorted, for loading?

Original comment by dtab...@gmail.com on 22 Jun 2013 at 7:19

GoogleCodeExporter commented 8 years ago

Original comment by ima...@uci.edu on 14 Oct 2014 at 9:34

Added labels: Weekly

br1ghtyang / asterixdb

No error shows when local asterix instance loads data with its disk space used up #535