crystal-lang / crystal

The Crystal Programming Language
https://crystal-lang.org
Apache License 2.0
19.5k stars 1.62k forks source link

XML::Node#to_s truncates large files #4710

Closed pedantic-git closed 2 years ago

pedantic-git commented 7 years ago

Hi Crystal devs! Thanks for fixing my earlier XML issue so quickly. I'm afraid I have another one.

It seems like the XML::Node#to_s truncates very large files, in my case at around 1.2MB.

Try this:

require "xml"
f = File.open "/path/to/EnragedBull.svg"
xml = XML.parse(f)
puts xml.to_s
# or xml.to_s(STDOUT)

(you can get the EnragedBull.svg here: https://openclipart.org/download/282790/EnragedBull.svg )

The file, which is 2.2MB to begin with, is cut short around 1.2MB.

I had a cursory look at the code but since it's making calls into LibXML I'm afraid I'm at a loss to fix it myself.

bmmcginty commented 7 years ago

Not a dev,but figured I'll try to help. What crystal version and hardware are you on? I'm not reproducing this on 0.23.0 (or my oldest available version, 0.20.5).

pedantic-git commented 7 years ago

Hi @bmmcginty - thanks!

crystal --version
Crystal 0.23.1 [e2a1389] (2017-07-13) LLVM 3.8.1

ldd ./test
    linux-vdso.so.1 =>  (0x00007ffe0cec3000)
    libxml2.so.2 => /usr/lib/x86_64-linux-gnu/libxml2.so.2 (0x00007fb76adb3000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fb76ab95000)
    librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fb76a98d000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fb76a789000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fb76a572000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fb76a1a9000)
    /lib64/ld-linux-x86-64.so.2 (0x00005589afda1000)
    libicuuc.so.57 => /usr/lib/x86_64-linux-gnu/libicuuc.so.57 (0x00007fb769e01000)
    libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fb769be5000)
    liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007fb7699bf000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fb7696b6000)
    libicudata.so.57 => /usr/lib/x86_64-linux-gnu/libicudata.so.57 (0x00007fb767c39000)
    libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fb7678af000)

dpkg -l libxml2
||/ Name                         Version             Architecture        Description
+++-============================-===================-===================-==============================================================
ii  libxml2:amd64                2.9.4+dfsg1-2.2     amd64               GNOME XML library
ii  libxml2:i386                 2.9.4+dfsg1-2.2     i386                GNOME XML library

I also noticed that the compiler bombs when I try to build the above test script with --release:

crystal build --release test.cr
crystal: /var/cache/omnibus/src/llvm/llvm-3.8.1.src/lib/CodeGen/LexicalScopes.cpp:160: llvm::LexicalScope* llvm::LexicalScopes::getOrCreateRegularScope(const llvm::DILocalScope*): Assertion `cast<DISubprogram>(Scope)->describes(MF->getFunction())' failed.
/usr/bin/crystal: line 102: 15114 Aborted                 (core dumped) "$INSTALL_DIR/embedded/bin/crystal" "$@"

Let me know if there's anything else I can help with!

bmmcginty commented 7 years ago

Just tried with that exact revision of crystal, so I suspect it's your LLVM version. What distro are you running? Maybe possible to upgrade llvm? If not, and you can give me the info on your distro, I can try and spin up a cloud VM and see what I can do to assist...though I'm not sure exactly what yet.

pedantic-git commented 7 years ago

Thanks! I'm away from my computer right now but it's Ubuntu 17.04 (x86_64), fully patched every day.

pedantic-git commented 7 years ago

@bmmcginty Just looking at it now - looks like my Crystal binary (from the official Crystal Ubuntu repo) is statically linked against LLVM 3.8.1, but my distro does have 4.0 installed.

asterite commented 7 years ago

@pedantic-git I can't reproduce this. Is there any chance you can show us the output of that puts xml.to_s?

pedantic-git commented 7 years ago

@asterite Huh - funny that you can't reproduce it with the EnragedBull.svg file. Since filing this I've got a new workstation and I've changed my distro from Ubuntu to Arch but the issue still manifests in the same way!

Here are some links:

asterite commented 7 years ago

I'm trying this on OSX, so it might be an issue only in linux. I'll try with docker.

pedantic-git commented 7 years ago

Thanks! I suspect it's somewhere in the interface with the underlying LibXML so it wouldn't surprise me if it was OS-specific.

asterite commented 7 years ago

@pedantic-git I just tried it in docker and it worked fine. What OS are you using?

pedantic-git commented 7 years ago

@asterite I'm using Arch Linux but the same thing happened on Ubuntu.

Try this Dockerfile:

FROM base/devel
RUN pacman -Sy --noconfirm crystal libxml2
WORKDIR /tmp
ADD EnragedBull.svg test.cr /tmp/
RUN bash -c "crystal test.cr > output.svg"
CMD ls -lh

For me when that's run it outputs:

total 3.3M
-rw-r--r-- 1 root root 2.1M Sep 11 12:46 EnragedBull.svg
-rw-r--r-- 1 root root 1.2M Sep 11 12:50 output.svg
-rw-r--r-- 1 root root   80 Sep 11 12:46 test.cr
asterite commented 7 years ago
$ docker build -t crystaltest:xml .
Sending build context to Docker daemon  4.391MB
Step 1/6 : FROM base/devel
 ---> e0972358566d
Step 2/6 : RUN pacman -Sy --noconfirm crystal libxml2
 ---> Using cache
 ---> 04c52d86dd93
Step 3/6 : WORKDIR /tmp
 ---> Using cache
 ---> abeb5a7b35ed
Step 4/6 : ADD EnragedBull.svg test.cr /tmp/
 ---> b13c1ef3ed3f
Removing intermediate container 607e0e14a975
Step 5/6 : RUN bash -c "crystal test.cr > output.svg"
 ---> Running in cb9d1585927b
 ---> f9cb0f26c1c7
Removing intermediate container cb9d1585927b
Step 6/6 : CMD ls -lh
 ---> Running in a493a8f343fe
 ---> 3e4af00a31a6
Removing intermediate container a493a8f343fe
Successfully built 3e4af00a31a6
Successfully tagged crystaltest:xml

$ docker run crystaltest:xml
total 4.2M
-rw-r--r-- 1 root root 2.1M Sep 11 13:07 EnragedBull.svg
-rw-r--r-- 1 root root 2.1M Sep 11 13:16 output.svg
-rw-r--r-- 1 root root   79 Sep 11 13:16 test.cr

No idea why you are getting different results...

Could be https://github.com/crystal-lang/crystal/issues/2713 . What if you write that string to a file, from inside Crystal? Using > is known to not work very well in Crystal.

pedantic-git commented 7 years ago

Same problem for me (originally I experienced this in a Kemal app).

test.cr:

require "xml"
f = File.open "EnragedBull.svg"
xml = XML.parse(f)
File.write "output.svg", xml.to_s

Dockerfile:

FROM base/devel
RUN pacman -Sy --noconfirm crystal libxml2
WORKDIR /tmp
ADD EnragedBull.svg test.cr /tmp/
RUN crystal test.cr
CMD ls -lh

In a shell:

 ~/Desktop  docker build .
Sending build context to Docker daemon  2.197MB
Step 1/6 : FROM base/devel
 ---> 2e0e74301392
Step 2/6 : RUN pacman -Sy --noconfirm crystal libxml2
 ---> Using cache
 ---> 08fb581d732a
Step 3/6 : WORKDIR /tmp
 ---> Using cache
 ---> fdde9a4562cf
Step 4/6 : ADD EnragedBull.svg test.cr /tmp/
 ---> 60dc1009df8c
Step 5/6 : RUN crystal test.cr
 ---> Running in 6da5c1e3d7c2
 ---> c7eb0ab0817e
Removing intermediate container 6da5c1e3d7c2
Step 6/6 : CMD ls -lh
 ---> Running in 71ee6aebcf8f
 ---> 35b8ddb17fe5
Removing intermediate container 71ee6aebcf8f
Successfully built 35b8ddb17fe5
 ~/Desktop  docker run 35b8ddb17fe5
total 3.3M
-rw-r--r-- 1 root root 2.1M Sep 11 12:46 EnragedBull.svg
-rw-r--r-- 1 root root 1.2M Sep 11 13:22 output.svg
-rw-r--r-- 1 root root  100 Sep 11 13:21 test.cr

Could it be a difference between running Docker on a Linux kernel vs a Darwin kernel? Seems pretty unlikely! I would be inclined to blame my hardware but this is a new workstation since the bug was originally filed (they were both XPS13s with i7 processors, but 2 years apart).

RX14 commented 7 years ago

I can reproduce:

$ docker build .
Sending build context to Docker daemon  2.197MB
Step 1/6 : FROM base/devel
latest: Pulling from base/devel
3a32adc5d06e: Pull complete
3c005aad0569: Pull complete
fcd7db7c97c1: Pull complete
cc43857431eb: Pull complete
44d26cc3e206: Pull complete
Digest: sha256:07d592e4b3409b6436230a6db84aa6bc8f8550acf95ccd48e0a5023ba3d19523
Status: Downloaded newer image for base/devel:latest
 ---> e0972358566d
Step 2/6 : RUN pacman -Sy --noconfirm crystal libxml2
 ---> Running in 9bc277a32d55
:: Synchronizing package databases...
downloading core.db...
downloading extra.db...
downloading extra.db...
downloading extra.db...
downloading extra.db...
downloading extra.db...
downloading community.db...
resolving dependencies...
looking for conflicting packages...

Packages (5) libedit-20170329_3.1-1  libevent-2.1.8-1  llvm-libs-4.0.1-5  crystal-0.23.1-1  libxml2-2.9.5+6+g07e227ed-1

Total Download Size:    17.45 MiB
Total Installed Size:  125.58 MiB

:: Proceed with installation? [Y/n]
:: Retrieving packages...
downloading libevent-2.1.8-1-x86_64.pkg.tar.xz...
downloading libedit-20170329_3.1-1-x86_64.pkg.tar.xz...
downloading libedit-20170329_3.1-1-x86_64.pkg.tar.xz...
downloading libedit-20170329_3.1-1-x86_64.pkg.tar.xz...
downloading libedit-20170329_3.1-1-x86_64.pkg.tar.xz...
downloading libedit-20170329_3.1-1-x86_64.pkg.tar.xz...
downloading llvm-libs-4.0.1-5-x86_64.pkg.tar.xz...
downloading libxml2-2.9.5+6+g07e227ed-1-x86_64.pkg.tar.xz...
downloading crystal-0.23.1-1-x86_64.pkg.tar.xz...
downloading crystal-0.23.1-1-x86_64.pkg.tar.xz...
downloading crystal-0.23.1-1-x86_64.pkg.tar.xz...
downloading crystal-0.23.1-1-x86_64.pkg.tar.xz...
downloading crystal-0.23.1-1-x86_64.pkg.tar.xz...
checking keyring...
checking package integrity...
loading package files...
checking for file conflicts...
checking available disk space...
:: Processing package changes...
installing libevent...
Optional dependencies for libevent
    python2: to use event_rpcgen.py
installing libedit...
installing llvm-libs...
installing crystal...
Optional dependencies for crystal
    shards: crystal language package manager
    libyaml: For YAML support
    gmp: For BigInt support [installed]
    libxml2: For XML support [pending]
installing libxml2...
:: Running post-transaction hooks...
(1/1) Arming ConditionNeedsUpdate...
 ---> 768a519f64a1
Removing intermediate container 9bc277a32d55
Step 3/6 : WORKDIR /tmp
 ---> 6b7c8d19733d
Removing intermediate container ca06e0f29d4c
Step 4/6 : ADD EnragedBull.svg test.cr /tmp/
 ---> 26333f691206
Step 5/6 : RUN bash -c "crystal test.cr > output.svg"
 ---> Running in 582a805c848f
 ---> 3f750795bdf3
Removing intermediate container 582a805c848f
Step 6/6 : RUN ls -lh
 ---> Running in f957b53a1589
total 3.3M
-rw-r--r-- 1 root root 2.1M Sep 11 10:36 EnragedBull.svg
-rw-r--r-- 1 root root 1.2M Sep 11 13:25 output.svg
-rw-r--r-- 1 root root   80 Sep 11 10:36 test.cr
 ---> 8a37f1ec1e01
Removing intermediate container f957b53a1589
Successfully built 8a37f1ec1e01
RX14 commented 7 years ago

Even with writing in crystal!

require "xml"
f = File.open "EnragedBull.svg"
xml = XML.parse(f)
File.write("test.xml", xml.to_s)
Sending build context to Docker daemon  2.197MB
Step 1/6 : FROM base/devel
 ---> e0972358566d
Step 2/6 : RUN pacman -Sy --noconfirm crystal libxml2
 ---> Using cache
 ---> 768a519f64a1
Step 3/6 : WORKDIR /tmp
 ---> Using cache
 ---> 6b7c8d19733d
Step 4/6 : ADD EnragedBull.svg test.cr /tmp/
 ---> e0b2e5ed31b9
Step 5/6 : RUN bash -c "crystal test.cr > output.svg"
 ---> Running in ee6c73c5fb2d
 ---> f8e0dfdefd0e
Removing intermediate container ee6c73c5fb2d
Step 6/6 : RUN ls -lh
 ---> Running in 15fb33838858
total 3.3M
-rw-r--r-- 1 root root 2.1M Sep 11 10:36 EnragedBull.svg
-rw-r--r-- 1 root root    0 Sep 11 13:27 output.svg
-rw-r--r-- 1 root root   98 Sep 11 13:27 test.cr
-rw-r--r-- 1 root root 1.2M Sep 11 13:27 test.xml
 ---> 0e83cd04ae41
Removing intermediate container 15fb33838858
Successfully built 0e83cd04ae41
RX14 commented 7 years ago

Considering me and @asterite have exactly the same base/devel hash, and docker on mac runs in a real linux kernel, this seems incredibly strange.

Just as a sanity check, here's the sha1 of my EnragedBull.svg: 6219109e159c3b6df38a86f640270a3c9277c896

asterite commented 7 years ago

@RX14 But outside docker it works fine?

pedantic-git commented 7 years ago

I know Docker on Mac no longer needs VirtualBox to run so perhaps it uses some more advanced Mac virtualization rather than a Linux kernel these days.

I'm just trying it out on the official Docker VM using docker-machine - will report back shortly.

pedantic-git commented 7 years ago

Yep - same problem on the official Docker VM:

 ~/Desktop  docker-machine create --driver=virtualbox crystal-test
[...snip creation stuff...]
Docker is up and running!
To see how to connect your Docker Client to the Docker Engine running on this virtual machine, run: docker-machine env crystal-test
 ~/Desktop  eval $(docker-machine env crystal-test)
 ~/Desktop  docker build .
Sending build context to Docker daemon  2.197MB
[...snip building stuff...]
Step 5/6 : RUN crystal test.cr
 ---> Running in 0994fa80a6c1
 ---> 0cfd5dbc0c48
Removing intermediate container 0994fa80a6c1
Step 6/6 : CMD ls -lh
 ---> Running in 3b16115b199b
 ---> 92b67fe284f8
Removing intermediate container 3b16115b199b
Successfully built 92b67fe284f8
 ~/Desktop  docker run 92b67fe284f8
total 3.3M
-rw-r--r-- 1 root root 2.1M Sep 11 12:46 EnragedBull.svg
-rw-r--r-- 1 root root 1.2M Sep 11 13:34 output.svg
-rw-r--r-- 1 root root  100 Sep 11 13:21 test.cr
RX14 commented 7 years ago

@asterite no, it's reproducible outside the container.

asterite commented 7 years ago

Then you can try to debug it, if you want and have time. The code is here: https://github.com/crystal-lang/crystal/blob/master/src/xml/node.cr#L424-L453

Maybe inspecting the values of these will be helpful: https://github.com/crystal-lang/crystal/blob/master/src/xml/node.cr#L434

rdp commented 2 years ago

Unable to repro with

Crystal 1.2.1 [4e6c0f26e] (2021-10-21)
LLVM: 10.0.0
Default target: x86_64-unknown-linux-gnu

Maybe it was some now-fixed flushing issue or something...

straight-shoota commented 2 years ago

I suppose we can close this then.