Closed finalclass closed 2 years ago
Thanks for the report. I have tested it on arm64 only. Will take a look
Thanks for your response. Indeed it compiles fine for arm64. However when I try to run it on Asus Tinker Edge T it hangs and after a while the asus shuts off. To reproduce:
cd ergo/examples/simple
GOARCH=arm64 go build simple.go
// scp to arm64 device
on the arm64 device:
./simple
The result: it hangs. I have to restart the device. Occasionally I get this error log:
Message from syslogd@vexing-eft at Apr 22 08:13:22 ...
kernel:[ 64.830251] Internal error: undefined instruction: 0 [#1] PREEMPT SMP
Message from syslogd@vexing-eft at Apr 22 08:13:22 ...
kernel:[ 64.964868] Process simple (pid: 4048, stack limit = 0xffff000013498000)
Message from syslogd@vexing-eft at Apr 22 08:13:22 ...
kernel:[ 65.199004] Code: d2800014 54000be1 b9404004 f9401c03 (23232323)
However usually there is no log at all.
I debugged it a little and found that it hangs on this line: https://github.com/ergo-services/ergo/blob/79bebaa/proto/dist/resolver.go#L225
However the value of the dsn
variable seams to be correct: localhost:4369
On my local PC it works fine (but I have erlang installed) but on the arm machine it does not. Could you give me some clues, I would really appreciate it.
To be specific, I have tested it on aarch64 and all tests passed. I will try to find the same hardware. It's a bit difficult to resolve it with no access to the similar environment.
On Fri, Apr 22, 2022, 10:32 Szymon Wygnański @.***> wrote:
Thanks for your response. Indeed it compiles fine for arm64. However when I try to run it on Asus Tinker Board T it hangs and after a while the asus shuts off. To reproduce:
cd ergo/examples/simple GOARCH=arm64 go build simple.go // scp to arm64 device
on the arm64 device:
./simple
The result: it hangs. I have to restart the device. Occasionally I get this error log:
Message from @.*** at Apr 22 08:13:22 ... kernel:[ 64.830251] Internal error: undefined instruction: 0 [#1] PREEMPT SMP
Message from @.*** at Apr 22 08:13:22 ... kernel:[ 64.964868] Process simple (pid: 4048, stack limit = 0xffff000013498000)
Message from @.*** at Apr 22 08:13:22 ... kernel:[ 65.199004] Code: d2800014 54000be1 b9404004 f9401c03 (23232323)
However usually there is no log at all.
I debugged it a little and found that it hangs on this line: https://github.com/ergo-services/ergo/blob/79bebaa/proto/dist/resolver.go#L225 However the value of the dsn variable seams to be correct: localhost:4369
On my local PC it works fine (but I have erlang installed) but on the arm machine it does not. Could you give me some clues, I would really appreciate it.
— Reply to this email directly, view it on GitHub https://github.com/ergo-services/ergo/issues/102#issuecomment-1106189951, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA5ATFEMHDT77JI5ZYZS5LVGJP2ZANCNFSM5T7VMSSQ . You are receiving this because you commented.Message ID: @.***>
Sorry for the delay. I still don't have anything similar to your device.
May I ask you to try these fixes: /ergo-services/ergo@v1.999.210/lib/tools.go:166 replace:
limit = 4294967000
by
limit = math.MaxInt
it also requires to add "math" module to the import
section.
And update ResourceUsage
function in ergo-services/ergo@v1.999.210/lib/osdep/linux.go
with this code
var usage syscall.Rusage
var utime, time int64
if err := syscall.Getrusage(syscall.RUSAGE_SELF, &usage); err == nil {
utime = int64(usage.Utime.Sec)*1000000000 + usage.Utime.Nano()
stime = int64(usage.Stime.Sec)*1000000000 + usage.Stime.Nano()
}
return utime, stime
Or you can try this branch https://github.com/ergo-services/ergo/tree/fixarm
Sure I will try it out. However I'm OOO currently and will be able to check it early next week.
Hi,
Unfortunately this does not seam to help. I've added go.mod to ergo/examples/simple
:
module simpl.com/simple
go 1.18
replace github.com/ergo-services/ergo => ../../
replace github.com/ergo-services/ergo/etf => ../../etf
replace github.com/ergo-services/ergo/gen => ../../gen
replace github.com/ergo-services/ergo/node => ../../node
require github.com/ergo-services/ergo v1.999.211 // indirect
and then I've made the fixes you've mentioned. There was one mistake on line:
var utime, time int64
I assume it should be:
var utime, stime int64
I've built everything and ran the script again but it hanged the same way it did last time.
so there were a few issues
It seems I need to get the same HW (or a similar one) or a way to run it somehow in the VM (I have no clue how to do this so far).
Yes the compilation issue has been fixed (actually I was compiling to the wrong target: it should be arm64, not just arm). The running part appears to still be a problem. Let me know if you wish me to test anything else on the device.
I finally bought this device. Waiting for the shipment.
I've just tested simple example on my tinker board. No issues (master branch)
Could you please check on your board the same way? You may also want to add -ergo.trace
to see extra debug info. Like this...
mendel@tinker:~$ ./simple -ergo.trace
2022/07/27 10:09:15 Start node with name "node@localhost" and cookie "cookies"
2022/07/27 10:09:15 Node listening range: 15000...65000
2022/07/27 10:09:15 Started embedded EMPD service and listen port: 4369
2022/07/27 10:09:15 EPMD accepted new connection from [::1]:46830
2022/07/27 10:09:15 Request from EPMD client: [0 22 120 58 152 77 0 0 6 0 5 0 4 110 111 100 101 0 4 17 59 1 0 0]
2022/07/27 10:09:15 [node@localhost] EPMD client: node registered
2022/07/27 10:09:15 [node@localhost] CORE registering behavior "erlang" in group "ergo:applications"
2022/07/27 10:09:15 [node@localhost] CORE registering process: <2C323E75.0.1001>
2022/07/27 10:09:15 [node@localhost] CORE spawn a new process <2C323E75.0.1001> (registered name: "")
2022/07/27 10:09:15 [node@localhost] CORE registering name (<2C323E75.0.1002>): net_kernel_sup
2022/07/27 10:09:15 [node@localhost] CORE registering process: <2C323E75.0.1002>
2022/07/27 10:09:15 [node@localhost] CORE spawn a new process <2C323E75.0.1002> (registered name: "net_kernel_sup")
2022/07/27 10:09:15 [node@localhost] SUPERVISOR "net_kernel_sup" with restart strategy: one_for_one[permanent]
2022/07/27 10:09:15 [node@localhost] CORE registering name (<2C323E75.0.1003>): net_kernel
2022/07/27 10:09:15 [node@localhost] CORE registering process: <2C323E75.0.1003>
2022/07/27 10:09:15 [node@localhost] CORE spawn a new process <2C323E75.0.1003> (registered name: "net_kernel")
2022/07/27 10:09:15 NET_KERNEL: Init: []etf.Term(nil)
2022/07/27 10:09:15 [node@localhost] LINK process: <2C323E75.0.1002> => <2C323E75.0.1003>
2022/07/27 10:09:15 [node@localhost] CORE registering name (<2C323E75.0.1004>): global_name_server
2022/07/27 10:09:15 [node@localhost] CORE registering process: <2C323E75.0.1004>
2022/07/27 10:09:15 [node@localhost] CORE spawn a new process <2C323E75.0.1004> (registered name: "global_name_server")
2022/07/27 10:09:15 [node@localhost] LINK process: <2C323E75.0.1002> => <2C323E75.0.1004>
2022/07/27 10:09:15 [node@localhost] CORE registering name (<2C323E75.0.1005>): rex
2022/07/27 10:09:15 [node@localhost] CORE registering process: <2C323E75.0.1005>
2022/07/27 10:09:15 [node@localhost] CORE spawn a new process <2C323E75.0.1005> (registered name: "rex")
2022/07/27 10:09:15 REX: Init: []etf.Term(nil)
2022/07/27 10:09:15 [node@localhost] CORE registering behavior "erpc" in group "ergo:remote"
2022/07/27 10:09:15 [node@localhost] LINK process: <2C323E75.0.1002> => <2C323E75.0.1005>
2022/07/27 10:09:15 [node@localhost] CORE registering name (<2C323E75.0.1006>): observer_backend
2022/07/27 10:09:15 [node@localhost] CORE registering process: <2C323E75.0.1006>
2022/07/27 10:09:15 [node@localhost] CORE spawn a new process <2C323E75.0.1006> (registered name: "observer_backend")
2022/07/27 10:09:15 OBSERVER: Init: []etf.Term(nil)
2022/07/27 10:09:15 [node@localhost] RPC provide: proc_lib:translate_initial_call (gen.RPC)(0x1fc500)
2022/07/27 10:09:15 [node@localhost] RPC provide: appmon_info:start_link2 (gen.RPC)(0x1fc740)
2022/07/27 10:09:15 [node@localhost] LINK process: <2C323E75.0.1002> => <2C323E75.0.1006>
2022/07/27 10:09:15 [node@localhost] CORE registering name (<2C323E75.0.1007>): erlang
2022/07/27 10:09:15 [node@localhost] CORE registering process: <2C323E75.0.1007>
2022/07/27 10:09:15 [node@localhost] CORE spawn a new process <2C323E75.0.1007> (registered name: "erlang")
2022/07/27 10:09:15 ERLANG: Init: []etf.Term(nil)
2022/07/27 10:09:15 [node@localhost] LINK process: <2C323E75.0.1002> => <2C323E75.0.1007>
2022/07/27 10:09:15 [node@localhost] LINK process: <2C323E75.0.1001> => <2C323E75.0.1002>
2022/07/27 10:09:15 [node@localhost] CORE registering name (<2C323E75.0.1008>): gs1
2022/07/27 10:09:15 [node@localhost] CORE registering process: <2C323E75.0.1008>
2022/07/27 10:09:15 [node@localhost] CORE spawn a new process <2C323E75.0.1008> (registered name: "gs1")
2022/07/27 10:09:15 [node@localhost] CORE route message by pid (local) <2C323E75.0.1008>
2022/07/27 10:09:15 [node@localhost] GEN_SERVER <2C323E75.0.1008> got message from <2C323E75.0.1008>
2022/07/27 10:09:15 m: 100
HandleInfo: 100
2022/07/27 10:09:16 [node@localhost] CORE route message by pid (local) <2C323E75.0.1008>
2022/07/27 10:09:16 [node@localhost] GEN_SERVER <2C323E75.0.1008> got message from <2C323E75.0.1008>
2022/07/27 10:09:16 m: 101
HandleInfo: 101
2022/07/27 10:09:17 [node@localhost] CORE route message by pid (local) <2C323E75.0.1008>
2022/07/27 10:09:17 [node@localhost] GEN_SERVER <2C323E75.0.1008> got message from <2C323E75.0.1008>
2022/07/27 10:09:17 m: 102
HandleInfo: 102
2022/07/27 10:09:18 [node@localhost] CORE route message by pid (local) <2C323E75.0.1008>
2022/07/27 10:09:18 [node@localhost] GEN_SERVER <2C323E75.0.1008> got message from <2C323E75.0.1008>
2022/07/27 10:09:18 m: 103
HandleInfo: 103
2022/07/27 10:09:19 [node@localhost] CORE route message by pid (local) <2C323E75.0.1008>
2022/07/27 10:09:19 [node@localhost] GEN_SERVER <2C323E75.0.1008> got message from <2C323E75.0.1008>
2022/07/27 10:09:19 m: 104
HandleInfo: 104
2022/07/27 10:09:20 [node@localhost] CORE route message by pid (local) <2C323E75.0.1008>
2022/07/27 10:09:20 [node@localhost] GEN_SERVER <2C323E75.0.1008> got message from <2C323E75.0.1008>
2022/07/27 10:09:20 m: 105
HandleInfo: 105
2022/07/27 10:09:20 [node@localhost] CORE unregistering process: <2C323E75.0.1008>
2022/07/27 10:09:20 [node@localhost] CORE unregistering name (<2C323E75.0.1008>): gs1
2022/07/27 10:09:20 [node@localhost] MONITOR process terminated: <2C323E75.0.1008>
exited
2022/07/27 10:09:20 accept tcp 127.0.0.1:15000: use of closed network connection
mendel@tinker:~$
Sorry for the late answer. That's quite strange becuase for me it still does not work. I don't have any customizations on the device, just a bare tinker os installation.
$ uname -a
Linux vexing-eft 4.14.98-imx #1 SMP PREEMPT Wed Jun 9 15:32:53 UTC 2021 aarch64 GNU/Linux
It seams that we even have the same kernel versions (only the compilation time is different).
what version of golang are you using? OS, distro?
PS: forgot to mention. I've updated simple.go by adding flag.Parse()
in order to use -ergo.trace
PPS: you may contact me via telegram halturin
for instant messaging.
just to make sure I've tested one more time (recently updated my OS environment)
I'm compiling it on:
$ go version
go version go1.19 linux/amd64
$ uname -a
Linux rog 5.15.59-1-MANJARO #1 SMP PREEMPT Wed Aug 3 11:20:04 UTC 2022 x86_64 GNU/Linux
May I ask you to build it using golang 1.18?
Unfortunately it's the same.
could you please add flag.Parse()
into the main
function of simple.go and run it with -ergo.trace
on this board?
btw, is this hostname (or localhost) present in the /ets/hosts?
thanks for the quick reply. I'll take a look at why the node couldn't connect to itself on the 4369. PS: do you have any firewall settings on this board?
and the last thing I would like to ask you... could you please try the v220
branch? It's got some changes in the network module.
This got me thinking. If it hangs on trying to connect to itself then a simple http server should also be broken:
So I guess you can close this issue because it's clearly not related to the ergo library.
There is one difference: with the http server it does not hang (the server). I can close it with Ctrl+C. When it comes to curl it hangs.
I guess there must be a firewall setting that drops any incoming connections. You may check this out with iptables -L
and iptables -F
to flush them out. Otherwise, I have no clue what it could be. Looks weird.
It seams that it's not a firewall
When I find some time I will try to reinstall the system.
Since this bug is not related to ergo I'm closing it.
Describe the bug Compiling to ARM does not work
To Reproduce
Expected behavior Compilation works fine
Actual behaviour This error is displayed:
Environment (please complete the following information):
Additional context Removing
GOARCH
fixes it however I need to run few services on ARM