crystal-lang / crystal

The Crystal Programming Language
https://crystal-lang.org
Apache License 2.0
19.33k stars 1.61k forks source link

Concurrent invocations of puts seem to interfere with each other #7978

Open rvprasad opened 5 years ago

rvprasad commented 5 years ago

Since I am not sure if this is a Crystal issue or Kemal issue, I have reported this to the Kemal team as well (kemalcr/kemal#543).

Description

A simple server responds to get requests with random numbers in JSON format. When the server is running on a Raspberry Pi and ab tool is used to issue sequential requests to the server in quick succession, the output of the server on standard output is garbled.

I am reporting the issue here as I am not sure if this is a crystal issue or a kemal issue.

Steps to Reproduce

  1. Execute the server following the instructions in README.md
  2. Execute ab -n 10 -c 1 http://127.0.0.1:1234/random?num=30

Expected behavior: The following or something similar should be seen in the terminal executing the server.

8.108ms
8.051ms
8.403ms
8.242ms
8.209ms
8.437ms
8.095ms
7.988ms
8.126ms
8.087ms

Actual behavior: The following is seen in the terminal executing the server.

-159659774543452127413503ApacheBench/2.3127.0.0.1:1234/random?num=30��<�pyR�p��%�h2.3
                                                                                     /get/random34
                                                                                                  /get/rams
-1358916334152017894057263��7R뱵��Al`|��34
                                          /get/random096127.0.0.1:41128 127.0.0.1
                                                                                 /get/random2.3ms
321435393.761ms
-0.000ms
-0.000ms
4076574188583769414937274��&��5���Qg����2.3
                                           /get/random34
                                                        /get/random30
                                                                     /get/random127.0.0.1:41152 127.0ms
-902563470739474328704365�^��b�Z'�v�;�ق2ms
-108291519172398526629057��1m�EU]�K�����9�2.3
                                             /get/random34
                                                          /get/randomms
0.000ms
-0.000ms

Reproduces how often: Always

Versions

Crystal 0.29.0 (2019-07-20)

Kemal 0.25.2

LLVM: 6.0.1 Default target: arm-unknown-linux-gnueabihf

Linux master10 4.14.34-hypriotos-v7+ #1 SMP Sun Apr 22 14:57:31 UTC 2018 armv7l GNU/Linux

Additional Information

The object file of the crystal compiler was compiled on Ubuntu/Laptop and then linked to create the executable on Hypriot/RaspberryPi.

rvprasad commented 5 years ago

I observed similar garbled output even after isolating puts into a dedicated fiber.

asterite commented 5 years ago

Totally offtopic, but there's no need to do Random.new every time. You can just use the top-level rand method.

asterite commented 5 years ago

I can't reproduce this on Mac OSX.

I don't think Crystal works fine on a Raspberry Pi. At least it's not something we officially support it so it'll be very hard for us to give help on this.

rvprasad commented 5 years ago

I haven't observed this issue on Linux. Just on Raspberry Pi 3. Since Crystal works on Linux and RaspPi runs Linux, I figured Crystal would work on RaspPi if Crystal could be cross compiled to run on RaspPi (even if this configuration is not supported). Clearly, that is not the case :)

girng commented 5 years ago

i feel like this is a unique and specific use-case. it also can't be reproduced on WSL or native Debian

rvprasad commented 5 years ago

I am not sure if this an issue with cross compilation (that is causing the compiled program to output garbage) or the Crystal compiler on Raspberry Pi. Any suggestions to verify this is welcome.

So far, I have tried building Crystal on Raspberry Pi 3B by 1) using the cross-compiled compiler and 2) bootstrapping the compiler (crystal-lang/boostrap-script). First approach failed due to insufficient memory while the second failed due to failure in building ruby in the first stage of bootstrapping.

That said, I am looking for a way to compile Crystal programs to run on Raspberry Pi 3B. Doing so on Raspberry Pi would be ideal.

asterite commented 5 years ago

@rvprasad Do yo know what architecture is Raspberry Pi running on? Or check if this gives true?

{{ puts flag?(:aarch64) }}

If so, #7983 might fix this issue.

rvprasad commented 5 years ago

Rasp Pi 3B uses Armv7 (v7l) processor. (The target triple is armv6-unknown-linux-gnueabihf.) So, the above test fails.

rvprasad commented 5 years ago

I did some digging with gdb. For this, I changed

puts "%.3fms" % (elapsed_time.total_nanoseconds / 1e6)

to

tmp1 = "%.3fms" % (elapsed_time.total_nanoseconds / 1e6)
puts tmp1

When a single request was submitted to the server, I found all of the strings and puts output were well-formed.

(gdb) info locals
ret = 0x245be8
elapsed_time = {seconds = 0, nanoseconds = 190677}
tmp1 = 0x260e60
(gdb) print &(tmp1->c)
$5 = (UInt8 *) 0x260e6c "-0.000ms"
(gdb) print &(ret->c)
$9 = (UInt8 *) 0x245bf4 "[680012]"

However, while elapsed_time.total_nanoseconds was 190677, the second string was "-0.000ms". This is wrong.

When two concurrent requests were submitted to the server, ret was well-formed but tmp1 was not.

(gdb) info locals
ret = 0x245990
elapsed_time = {seconds = 0, nanoseconds = 191874}
tmp1 = 0x23de10
(gdb) print &(tmp1->c)
$10 = (UInt8 *) 0x23de1c "-501870678723681322067296"
(gdb) print &(ret->c)
$11 = (UInt8 *) 0x24599c "[926334]"
rvprasad commented 5 years ago

Changing puts "%.3fms" % (elapsed_time.total_nanoseconds / 1e6) to puts "#{elapsed_time.total_nanoseconds / 1e6}ms" reduces the frequency of garbled output and the amount of garbled output in each instance.

ysbaddaden commented 5 years ago

There is an issue with hard/soft floats on ARM 32 targets. Specifying the target CPU while compiling may help LLVM backend to generate proper code (or not).

rvprasad commented 5 years ago

@ysbaddaden if you are suggesting floating point calculations (not data movements) involving floating point values) pose issues on ARM 32 targets, then I doubt that is the case as I was able to generate garbled output with the below minimal program.

require "kemal"

get "/random" do |env|
  tmp1 = "%fms" % 3.333
  puts tmp1
  tmp2 = "%dms" % 3.333
  puts tmp2
  ""
end

Kemal.config.logging = false
Kemal.config.port = 1234
puts "Serving at 0.0.0.0:1234"
Kemal.run

Submitting a single request to the above server produces the following output:

$ ./server 
Serving at 0.0.0.0:1234
-158972576233719173059H�/getget

get/random/getget

/ws/randomms
3ms

To cross-compile the compiler,

  1. Ran ./bin/crystal build src/compiler/crystal.cr --warnings all --cross-compile --target "armv6-unknown-linux-gnueabihf" -s -D without_openssl -D without_zlib on a Ubuntu box.
  2. Copied crystal.o onto a Raspberry Pi 3B running Raspbian 9.9 (from Hypriot).
  3. Ran sudo cc 'crystal.o' -o '/usr/lib/crystal/bin/crystal' -rdynamic /usr/share/crystal/src/llvm/ext/llvm_ext.o `/usr/bin/llvm-config-6.0 --libs --system-libs --ldflags 2> /dev/null` -lstdc++ -lpcre -lm -lgc /usr/share/crystal/src/ext/libcrystal.a -levent -lrt -ldl -lpthread -L/usr/lib -L/usr/local/lib after copying crystal/src into /usr/share/crystal/src (following the instructions here).

It seems the format specifier for float (%f) is broken.

straight-shoota commented 5 years ago

Well, the stringifier for floats uses float arithmetic and that might be broken as @ysbaddaden mentioned. Why do you doubt that this could be the issue? Your example does actually support that.

asterite commented 5 years ago

For %f we call sprintf, there might be an issue there. But it's hard if we can't reproduce it on Mac and Linux.

rvprasad commented 5 years ago

@straight-shoota TIL stringifying float values involves floating point calculations/arithmetic operations :) That said, it seems the floating point operations (e.g., the division operation worked fine) in the application code seems to work fine while the ones used to stringifying floats seem to be broken. Nevertheless, I will check this.

ysbaddaden commented 5 years ago

Don't use floats, at all. Use integers instead —and don't round a float!

The problem with hard/soft floats is that the caller/called ABI doesn't expect the value in the same registers (cpu integers vs fpu registers), which results in garbage.

rvprasad commented 5 years ago

The format_buf and float args to the invocation of LibC.snprintfat src/string/formatter.cr:265 are valid but the value filled into temp_buf is invalid. This is in line with @asterite's comment.

rvprasad commented 5 years ago

@asterite I was able to reproduce the issue in a Raspbian container running on a Linux host. The instructions to use the Docker image to create a compiler is available in docker-files repo. Once you have built the compiler and shards in the container, the issue can be repro-ed by building and executing the example. You will need to install curl in the container (via apt install curl) to repro the issue.

straight-shoota commented 5 years ago

Is the linux host running on armv6 as well or is it an x86 architecture? Raspbian has an x86 variant, so I'm not sure.

rvprasad commented 5 years ago

The Linux host is a x86 machine while the Raspbian is running on armv7l emulator.

ysbaddaden commented 5 years ago

Please read the following:

Likely a duplicate of #6954

All ARM isues are related to a hard/soft float issue. Specifying a CPU when compiling may fix the issue (it did for my qemu raspbian image) or not (fails on scaleway armv7 server, and reportedly rpi3).

Reproducing isn't an issue. It's quite easy to. The problem is understanding why LLVM won't enable hard floats when compiling. Maybe we miss some LLVM attributes on functions? I recently noticed that clang sets a bunch of them, related to FP and soft-float.

konovod commented 5 years ago

Maybe I'm saying something stupid, but for hard float you should specify cpu, fpu and float abi: -mcpu=cortex-m4 -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16

rvprasad commented 5 years ago

@konovod I tried doing this today but I gave up as I could not figure out how to pass these flags to LLVM compiler from the command line interface of crystal compiler.

rvprasad commented 5 years ago

@ysbaddaden Thanks for the pointer. I read thru some of these while digging around. BTW, what is the best way to pass the compiler options mentioned by @konovod to LLVM via crystals CLI? That would allow more experimentation.

asterite commented 5 years ago

The format_buf and float args to the invocation of LibC.snprintfat src/string/formatter.cr:265 are valid but the value filled into temp_buf is invalid.

What do you mean?

ysbaddaden commented 5 years ago

Maybe we just miss the "use-soft-float"="false" attribute on every functions? It should always be false, unless the target's arch is arm and the ABI is *eabi.

Maybe we miss the ability to trigger thumb mode; when -mthumb is specified clang changes the target from armv7-linux-gnueabihf to thumbv7-linux-gnueabihf for example (i.e. replaces arm with thumb).

rvprasad commented 5 years ago

@asterite I wasn't sure at which program point were the memory addresses getting messed up. So, I dug around and found that format_buf (the format string) and float (the value to be printed) arguments passed to LibC.snprintf at src/string/formatter.cr:265 were indeed the values they should have been. Assuming the other input argument len was valid, the fault resulting in the garbled output is encountered sometime after the LibC.snprintf is invoked at src/string/formatter.cr:265. This kinda aligns with @ysbaddaden suggestion -- ABI mismatch between the app and libc.

rvprasad commented 5 years ago

The failure occurs only when the foo() is invoked within the context of a Kemal Handler (at call2 and not at call1). This seems to suggest the issue may be specific to Kemal and not related to ABI or hard/soft floats. [I have mentioned this in kemalcr/kemal/issues/543.]

require "kemal"

def foo() 
  tp1 = "%fms" % 3.333
  puts tp1
  tp2 = "%dms" % 3.333
  puts tp2
end 

foo()   # call1

get "/random" do |env|
  foo() # call2
  ""
end

Kemal.config.logging = false
Kemal.config.port = 1234
puts "Serving at 0.0.0.0:1234"
Kemal.run

Output upon requesting http://127.0.0.1:1234/random

root@6f9b4df16a4b:/t/t# ./t
3.333000ms
3ms
Serving at 0.0.0.0:1234
-158972575668831573782�{#/getget

get/random/getget

/ws/randomms
3ms
asterite commented 5 years ago

What happens if you just use http server from the standard library? I don't think kemal is involved here.

rvprasad commented 5 years ago

The below snippet fails as well.

require "http/server"

def foo() 
  tp1 = "%fms" % 3.333
  puts tp1
  tp2 = "%dms" % 3.333
  puts tp2
end 

foo()   # call1

server = HTTP::Server.new do |ctxt|
  foo() # call2
  ctxt.response.print "Boo"
end

address = server.bind_tcp 1234
puts "Serving at #{address}"
server.listen

Output

3.333000ms
3ms
Serving at 127.0.0.1:1234
-158972575660657402946.

keep-alive

Keep-Alive

ConnectionHoidentity*/*ms
3ms
asterite commented 5 years ago

Cool! Now you could try reducing it to using spawn and Fiber.yield in between calls, and spawning a lot of fibers and see if it still reproduces.

The less code we have that reproduces the problem, the easier to find the cause.

asterite commented 5 years ago

(I'll send such code later today)

asterite commented 5 years ago

@rvprasad Try this:

chan = Channel(Nil).new

1000.times do |i|
  spawn do
    tp1 = "%fms" % 3.333
    puts tp1
    Fiber.yield
    tp2 = "%dms" % 3.333
    puts tp2
    chan.send(nil)
  end
end

1000.times do
  chan.receive
end
rvprasad commented 5 years ago

@asterite Thanks. I was able to repro the issue in the Docker container.

spawn do
  tp1 = "%fms" % 3.333
  puts tp1
end

sleep(10)
asterite commented 5 years ago

What output you get with that program?

rvprasad commented 5 years ago

It produces the following output in a container.

root@4b026a742082:/t/t# crystal run -d s.cr 
Using compiled compiler at `.build/crystal'
-158972575643887689414`?� I�l@Z~?6�~΄/t/t//t/t@@ms
asterite commented 5 years ago

But it works fine if you don't do it inside spawn?

rvprasad commented 5 years ago

Yes.

Also, output is fine when spawn is replaced with 1.times.

rvprasad commented 5 years ago

@ysbaddaden In the below snippet, foo prints 1.000000 2.000000 when invoked in mydo but it prints 0.000000 0.000000 when invoked in the fiber. Since the same code fragment provides both "correct" and "incorrect" outcomes, I doubt if this is an ABI related issue but...

The float argument to corresponding calls to LibC.snprintf in string/formatter.c in the above invocations are identical, i.e., 1.0 and 2.0. So, it seems the context swapping is interfering with the call to an external library function.

def foo
  i = {1.0, 2.0}
  printf("%f %f\n", *i)
end

def mydo(&x)
  x.call()
end

mydo(&->foo)
spawn foo

sleep(2)

Any ideas on what/where might be the issue?

ysbaddaden commented 5 years ago

We save/restore FPU registers on ARM when switching fibers, and it leads to the same hard/soft float codegen bug.

rvprasad commented 4 years ago

I want to experiment with different LLVM compiler flags during code generation, e.g., related to FPU and NEON. Any ideas where and how these flags can be added/modified/specified?

HCLarsen commented 1 year ago

I know this issue has been dormant for a couple of years, but I wanted to bring up that I have also experienced this, although it was much earlier this year.