JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.68k stars 5.48k forks source link

Unable to resolve DNS within Julia on Windows #5574

Closed mlubin closed 10 years ago

mlubin commented 10 years ago

I'm helping out a user who's experiencing a very strange issue (Windows 8, 64-bit):

julia> ;nslookup github.com
Server: UnKnown
Address: 18.71.[...].[...]

*** UnKnown can't find github.com: No response from server

But pinging github's IP address from within julia works.

The strange part is that if you run the same command in the Git bash that comes with Julia, it works fine:

$ nslookup github.com
Server: [...]
Address: 18.71.[...].[...]

Non-authoritative answer:
Name: github.com
Address: 192.30.252.131

This is an issue with all DNS lookups, not just github. There are no firewalls enabled (that I can find). The same occurs with julia 0.2 and 0.3. This happens on all networks, not just MIT wifi. The internet connection seems to work fine from all applications except julia. How can I debug this?

vtjnash commented 10 years ago

it's only in a source compile, but google agrees that your processor is ivy bridge, so I don't think it is a haswell bug

vtjnash commented 10 years ago

According to a vague comment in the libuv source, I have discovered there are such things as "non-IFS LSPs", which seem to be defined as "LSPs which are not IFS" in most of my google results, and which could cause this failure.

Anyways, libuv aborts at startup if any of these are detected in the network stack, due to a firewall, local packet sniffer, virus / malware, or corruption (since they can cause lost or delayed data, in Vista onwards). The following article seems to describe how to detect these intruders: http://support.microsoft.com/kb/811259

Can anyone with the problem confirm that this is / is not a problem on their machine?

edit: link fixed

Keno commented 10 years ago

Dead link

drlivip commented 10 years ago

On Thu, Jun 12, 2014 at 11:02 PM, Jameson Nash notifications@github.com wrote:

"LSPs which are not IFS"

I don't know exactly what I'm doing here, but I did find this: http://support.microsoft.com/kb/2568167 and in it there is a command line instruction "netsh Winsock Show Catalog". Executing it in a command window reveals "Winsock Catalog Provider Entry" (entries) which contain a field denoted as "Service Flags:". According to the referenced document, if the service flags contain 0x20000, the LSP is IFS. If the most significant bit is cleared, it is non IFS. All entries received when I ran the command had service flags which contained 0x20000. I'm assuming this means the LSPs in my network stack play by the rules.

David L. Livingston, Ph.D., P.E.

Design Engineer/Consultant Livingston Embedded Computing, LLC d.livingston@ieee.org, 540-520-1848

Professor of Electrical and Computer Engineering Virginia Military Institute livingstondl@vmi.edu, 540-464-7545

"100% of the shots not taken don't go in." The Great Gretzsky, ice hockey player "Without deviation from the norm, progress is not possible." Frank Zappa, musician, composer and social satirist "Complexity breeds fragility. Fragility breeds surprises. Surprises are bad." Bob Colwell, computer engineer and Pentium architect

vtjnash commented 10 years ago

Other possibility: http://social.msdn.microsoft.com/Forums/windowsdesktop/en-US/3076a9cd-57a0-418d-8de1-07adc3b486bb/socket-fails-with-error-10022-when-application-is-run-from-certain-network-shares-on-vista-and?forum=wsk

vtjnash commented 10 years ago

Or this ancient, unsolved version, which sounds so nearly identical: http://www.itlisting.org/5-windows/228b667d9634aa62.aspx

kbauer commented 10 years ago

I updated to Windows 8.1 last weekend. Using the same installer as before, I can no longer reproduce the issue.

Other than upgrading, I had to create a user profile from scratch, though I ALSO had to copy over the C:\Users\Default from a Windows 7 machine, after having the upgrade corrupt that profile, making creation of functional new users impossible.

The issue also doesn't exist in a secondary profile I had migrated through the upgrade, but I cannot tell if the problem would have been reproducible there before.

Can anyone try if the problem persists after

  1. Creating a new user.
  2. Installing julia with the installer under the new user?
vtjnash commented 10 years ago

I just realized that the environments shown above have a PATH variable near the windows maximum length of somewhere between 1024 and 32768 (likely numbers also include 1920, 2047, 8191)

JeffBezanson commented 10 years ago

Is there any hope of dealing with this soon?

vtjnash commented 10 years ago

actually, maybe.

the Windows documentation mentions that you shouldn't call getenv from DllMain. allocating memory, and spawning threads is also advised against. mostly the side-effect will be occasional deadlock, but it also mentions that some of the windows APIs will attempt to access uninitialized memory (specifically Advapi32, for interacting with the register, among other things). libopenblas does many of these things, and removing DllMain also fixes this bug.

edit: ref http://msdn.microsoft.com/en-us/library/windows/desktop/dn633971(v=vs.85).aspx

vtjnash commented 10 years ago

using the following gist, kakobrekla was able to prove that this bug is in openblas: https://gist.github.com/vtjnash/bfbdfe55915557f0d691

edit: link to results http://dpaste.com/26MM927.txt

ihnorton commented 10 years ago

What a nightmare. You and kakobrekla deserve a medal. I'm still curious what changed in Win8 to trigger this.

(I'm running builds with clean-openblas - should be sufficient to apply the patch, I think. So next binaries should include this)

JeffBezanson commented 10 years ago

Wow, amazing. One of the most inscrutable and heavily-discussed bugs of all.

tkelman commented 10 years ago

Something's not quite right yet with the fix, just tried the latest binaries posted half an hour ago:

Warning: error initializing module LinAlg:
ErrorException("ccall: could not find function gotoblas_init in library libopenblas")
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "help()" to list help topics
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.3.0-prerelease+3836 (2014-06-22 03:25 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit 0d98451 (0 days old master)
|__/                   |  x86_64-w64-mingw32
vtjnash commented 10 years ago

Should have been distclean-openblas or rm -r openblas-v0.2.9 to apply the patch. @ihnorton

ViralBShah commented 10 years ago

This bugfix is totally crazy! Who could have thought that BLAS will screw DNS.

tkelman commented 10 years ago

@vtjnash yeah seems okay on a local build, no warning and I do see gotoblas_init exported from libopenblas.dll. If you're going to clean-openblas, may as well distclean-openblas too. Download time is a lot quicker than rebuild time for openblas.

ihnorton commented 10 years ago

New binaries are up, rebuilt after removing the openblas directory. Message is gone.

tkelman commented 10 years ago

Thanks, but you forgot the LLVM patch!

ihnorton commented 10 years ago

I distcleaned llvm a couple days ago for that :/ Doing a from-scratch build now.

wheineman commented 10 years ago

Have these changes made it into the nightly builds? I downloaded the src and did a clean build and I'm still seeing the issue.

drlivip commented 10 years ago

Just downloaded and installed the 64-bit prerelease. Ran Pkg.init() (after deleting old history files) and no change in behavior. Still getting the "...could not resolve..." errors.

On Mon, Jun 23, 2014 at 1:21 PM, Willy notifications@github.com wrote:

Have these changes made it into the nightly builds? I downloaded the src and did a clean build and I'm still seeing the issue.

— Reply to this email directly or view it on GitHub https://github.com/JuliaLang/julia/issues/5574#issuecomment-46874685.

David L. Livingston, Ph.D., P.E.

Design Engineer/Consultant Livingston Embedded Computing, LLC d.livingston@ieee.org, 540-520-1848

Professor of Electrical and Computer Engineering Virginia Military Institute livingstondl@vmi.edu, 540-464-7545

"100% of the shots not taken don't go in." The Great Gretzsky, ice hockey player "Without deviation from the norm, progress is not possible." Frank Zappa, musician, composer and social satirist "Complexity breeds fragility. Fragility breeds surprises. Surprises are bad." Bob Colwell, computer engineer and Pentium architect

vtjnash commented 10 years ago

:(

Find me on IRC in the evening if you can help run more tests

vtjnash commented 10 years ago

http://webchat.freenode.net/?channels=julia

IainNZ commented 10 years ago

We need a "craziest bugs" discussion tomorrow!

wheineman commented 10 years ago

Julia 0.3 is now fully operational on my corporate issued Windows 8 laptop. With gratitude to all who helped solve this, thank you.

mlubin commented 10 years ago

Is this fix in the nightlies? I'll ask the original user who had this issue to try it out.

ihnorton commented 10 years ago

The one from yesterday is; the latest commit Jameson just pushed is not, but will be in an hour or so.

vtjnash commented 10 years ago

@kakobrekla @wheineman @drlivip @flashus please confirm this bug is also fixed for you (and not broken again by my latest commit -- it looks like ihnorton has updated the binaries on http://status.julialang.org/ as promised)

the-moliver commented 10 years ago

I've been following this and can confirm the bug is now fixed for me! Great work!

kakobrekla commented 10 years ago

@vtjnash confirmed. Please include it in 0.2.1.

wheineman commented 10 years ago

Still working, still happy....

julia> versioninfo() Julia Version 0.3.0-prerelease+3911 Commit 19582f7* (2014-06-27 16:05 UTC) Platform Info: System: Windows (x86_64-w64-mingw32) CPU: Intel(R) Core(TM) i7-4600M CPU @ 2.90GHz WORD_SIZE: 64 BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY) LAPACK: libopenblas LIBM: libopenlibm

drlivip commented 10 years ago

Yes, the bugs have been squashed. Thank you very much for your efforts.

Dave

On Sat, Jun 28, 2014 at 5:32 AM, Willy notifications@github.com wrote:

Still working, still happy....

julia> versioninfo() Julia Version 0.3.0-prerelease+3911 Commit 19582f7 https://github.com/JuliaLang/julia/commit/19582f7* (2014-06-27 16:05 UTC) Platform Info: System: Windows (x86_64-w64-mingw32) CPU: Intel(R) Core(TM) i7-4600M CPU @ 2.90GHz WORD_SIZE: 64 BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY) LAPACK: libopenblas LIBM: libopenlibm

— Reply to this email directly or view it on GitHub https://github.com/JuliaLang/julia/issues/5574#issuecomment-47422883.

David L. Livingston, Ph.D., P.E.

Design Engineer/Consultant Livingston Embedded Computing, LLC d.livingston@ieee.org, 540-520-1848

Professor of Electrical and Computer Engineering Virginia Military Institute livingstondl@vmi.edu, 540-464-7545

"100% of the shots not taken don't go in." The Great Gretzsky, ice hockey player "Without deviation from the norm, progress is not possible." Frank Zappa, musician, composer and social satirist "Complexity breeds fragility. Fragility breeds surprises. Surprises are bad." Bob Colwell, computer engineer and Pentium architect

mlubin commented 10 years ago

Just for some closure, the user who originally had this issue reports that it's resolved. Thanks to everyone who helped solve this!