Glow is an easy-to-use distributed computation system written in Go, similar to Hadoop Map Reduce, Spark, Flink, Storm, etc. I am also working on another similar pure Go system, https://github.com/chrislusf/gleam , which is more flexible and more performant.
I get the error Failed to create a queue on disk: Failed to open ... no such file or directory when a program is repeatedly executed (in distributed mode). It seems that the problem is related to size of the cluster - it happens often in cluster of 20 computers, but a cluster of 10 computers works fine.
The problem is caused by RotatingFileStore#init, which fails to open the old log files, and therefore, CreateNamedDatasetShard returns nil.
I think that the RotatingFileStore#init should not open the old log files, because they should already be removed by the previous statement in CreateNamedDatasetShard (done by m.doDelete). Isn't the problem caused by the ioutil.ReadDir(l.dir()), which returns old view of the file system?
Top of the stack trace:
2016/04/29 12:33:51 Failed to create a queue on disk: Failed to open bbaf2eac-ct-0-ds-2-shard-1111-8816-2016-04-29T12-01-44.112.dat: open bbaf2eac-ct-0-ds-2-shard-1111-8816-2016-04-29T12-01-44.112.dat: no such file or directory
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x20 pc=0x4ca2f8]
I get the error
Failed to create a queue on disk: Failed to open ... no such file or directory
when a program is repeatedly executed (in distributed mode). It seems that the problem is related to size of the cluster - it happens often in cluster of 20 computers, but a cluster of 10 computers works fine.The problem is caused by
RotatingFileStore#init
, which fails to open the old log files, and therefore,CreateNamedDatasetShard
returns nil.I think that the
RotatingFileStore#init
should not open the old log files, because they should already be removed by the previous statement inCreateNamedDatasetShard
(done bym.doDelete
). Isn't the problem caused by theioutil.ReadDir(l.dir())
, which returns old view of the file system?Top of the stack trace: