CentOS / sig-cloud-instance-images

CentOS cloud images
778 stars 560 forks source link

Same program but different behavior host/container : double free or corruption #119

Open alexisfrjp opened 6 years ago

alexisfrjp commented 6 years ago

Hi all,

I am trying to run a GUI application I haven't compiled inside a Docker container. It's working perfectly well on a host with Centos 7.5 but not in the Docker container using also Centos 7.5. I have no idea what's different, the glibc version is same 2.17-222.el7.

When I run the window for licensing (checking NIC/mac addresses), it crashes:

*** Error in `quartus': double free or corruption (out): 0x00007fcd04dcad10 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x81499)[0x7fcd0aa01499]
/lib64/libudev.so.1(udev_device_unref+0x37)[0x7fcd0255d5f7]
/lib64/libudev.so.1(+0x8a31)[0x7fcd02560a31]
/lib64/libudev.so.1(+0x8c0e)[0x7fcd02560c0e]
/lib64/libudev.so.1(udev_enumerate_scan_devices+0xd0)[0x7fcd025614c0]

I don't know where it comes from and I would like to understand why it doesn't work same as if it was a host.

$ docker version
Client:
 Version:      18.03.1-ce
 API version:  1.37
 Go version:   go1.9.5
 Git commit:   9ee9f40
 Built:        Thu Apr 26 07:20:16 2018
 OS/Arch:      linux/amd64
 Experimental: false
 Orchestrator: swarm

Server:
 Engine:
  Version:      18.03.1-ce
  API version:  1.37 (minimum version 1.12)
  Go version:   go1.9.5
  Git commit:   9ee9f40
  Built:        Thu Apr 26 07:23:58 2018
  OS/Arch:      linux/amd64
  Experimental: false

Description: CentOS Linux release 7.5.1804 (Core)

            "Labels": {
                "org.label-schema.schema-version": "= 1.0     org.label-schema.name=CentOS Base Image     org.label-schema.vendor=CentOS     org.label-schema.license=GPLv2     org.label-schema.build-date=20180531"
            }

What is different? How can I know the source cause? I am able to run perfectly in the host with the same version of libraries

It may be the Docker memory management, Centos image or anything else. The Docker container has a network eth0 with a non-null MAC address.

Thank you!

ghost commented 6 years ago

Hi @alexisfrjp, what gui program was that problem about? And how did you fixed it?

alexisfrjp commented 6 years ago

The app is quartus 16.1.

I don't understand how tcmalloc behaves differently depending on the CentOS version and the gperftools version... The fact that --net=host is required but it isn't using tcmalloc is mysterious. The problem is that I don't want to use --net=host I'd like to keep the isolation.

I couldn't expect that an update in packages would break everything since it's working perfectly in the host using the same version.

KuiWei004 commented 5 years ago

Hi, @alexisfrjp ,i met the same problem as you met using centos-7.4 docker. I have two questions about this problem: 1 I have tried to use LD_PRELOAD and tcmalloc, but it also failed to build my project, i want to know more details about how you use LD_PRELOAD and tcmalloc. 2 The error message seems like a mem error,but the true reason more like network. Do you figure out why --net=host can avoid this problem and --net=bridge results in this problem. I am very confused that why the network mode can cause a mem double free problem.

I look forward to hearing from you in the future!! Thank you!!!! My E-mail: kuiwei1995@outlook.com

jperrin commented 5 years ago

Keep in mind that CentOS containers (even the minor versioned ones) don't pin to a specific release by default. They will pull in the latest versions of the packages available on an update or install command.

You likely don't have a true 7.5.1804 or 7.4.whatever image.

alexisfrjp commented 5 years ago

@KuiWei004 My app checks if an interface is virtual or not by probing /sys/devices/virtual/net/eth.. or another file, I don't remember exactly. That makes my app crashing because the only network interface available inside a docker container is a virtual one. If you use --net=host, the network interface is not virtual anymore and can probe without any problem. I haven't found a workaround so far, I tried to use some custom tool like ptrace (to man-in-the-middle the syscall) but I had no time for that. 1- Compiling a new version of my program without this feature => I'll never do that, docker is supposed to make my life easier, not harder 2- Stop using Docker for that kind of program.

If you find a workaround for that, let me know.

@jperrin Yes, exactly, I found out that long time ago. I couldn't reproduce what I wanted because commands like yum update aren't consistent.

KuiWei004 commented 5 years ago

@alexisfrjp First, thank you! Your app is quartus,right? My app in container is only quartus pro 17.1,and each time I build my quartus project on host machine(64 cores, 128g ram), it will take about 4 hours, and it only cost 6 processors and 20g ram in total. So I want to run quartus in container, in this way , I can build more than one design versions in the meantime. As your ponit, quartus will check if the network interface is virtual, but quartus needs to use this virtual NIC to link my license server. Anyway,--net=host solved my problem. @jperrin Thank you for your attention. In fact, I noticed that too, but I have no idea how to fix it.

felixn commented 5 years ago

I have the same issue with Quartus 17.1 in a CentOS 7 docker image. The workaround with "--net=host" works for me. I'd love to investigate whether this is a docker or an altera issue, any hints on where to start?

KuiWei004 commented 5 years ago

@felixn The license of my quartus is on a license server, so I guess quartus checks whether my NIC is physical or virtual when compiling, and virtual NIC seems illegal to quartus.