MesserLab / SLiM

SLiM is a genetically explicit forward simulation software package for population genetics and evolutionary biology. It is highly flexible, with a built-in scripting language, and has a cross-platform graphical modeling environment called SLiMgui.
https://messerlab.org/slim/
GNU General Public License v3.0
161 stars 31 forks source link

port SLIM to Windows (incl. optional external GSL in CMake) #66

Closed jeromekelleher closed 2 years ago

jeromekelleher commented 4 years ago

It would simplify things considerably from the perspective of packaging SLiM in conda if there was an option to use an external GSL library in the CMake. Particularly for supporting Windows, where GSL has to be extensively patched to compile, it would make life a lot easier.

Unless GSL has been locally modified, I don't think this would affect SLiM itself? The default should stay as having GSL build locally so normal users aren't affected, but experts should have the option of linking to a system GSL.

rdinnager commented 3 years ago

Thanks for this detailed information. This should help a lot in getting this graphical stuff fixed up. Cheers!

bhaller commented 3 years ago

Hi @rdinnager. Looks like this has now gone through on pacman, so that's awesome. Is there anything else that needs to happen besides the writeup for the manual? Let me know if you need an assist with anything. This is so close now, it would be great if it made SLiM 3.7 (which still does not have a definite ship date set, and is just trundling along :->).

rdinnager commented 3 years ago

Hi @bhaller. Yes, I just tested the pacman to make sure it works. I even uninstalled qt5 to make sure that pacman installs it automatically as a dependency and it worked without a hitch. So yes, it is ready to go, and I just need to finish the writeup for the manual. the last couple days have been pretty full, but I should have some time tomorrow to work on that. Yes, let's aim to get it into the 3.7 release.

bhaller commented 2 years ago

Hi @rdinnager. Just a ping. :->

rdinnager commented 2 years ago

@bhaller Thanks, yes, I will finish up those last few things this week (tomorrow hopefully).

rdinnager commented 2 years ago

@bhaller Just to refresh my memory, windows_port has not been merged into master yet at any point, correct? Would you be able to merge changes to master into windows_port, so I can try to rebase against that again? I just fixed one last issue and would like to make a pull request but want to be sure everything is up to date first.

bhaller commented 2 years ago

@rdinnager Just did another merge, I believe. You are correct, the windows_port branch has not yet been merged back into master; I wasn't sure whether there would be little things you'd want to do before that happened. I'm happy to do it any time you say you're ready.

bhaller commented 2 years ago

@rdinnager Looks like three out of the four GUI builds failed, perhaps because some server didn't respond to the request to download Qt or something? I didn't look hard at it but that's what it looked like. Interestingly, the CLI builds succeeded, even though they are presently failing for master. I'm guessing you've got the tskit-based post-build tests disabled on Windows? That's what's failing for master, because we need some changes on the tskit side to happen that have not yet been rolled into a release. Anyhow, some messiness there.

rdinnager commented 2 years ago

Yes, looks like the GUI fails are to do with Qt failing to download. I have no idea what is going on with the tskit tests. I didn't disable any post build tests, and even if I did, I would have only disabled them for Windows, but these tests are running on linux and mac... weird..

bhaller commented 2 years ago

OK, well, I'm not worrying about CI too much right now as long as the problems are outside of SLiM's own code.

bhaller commented 2 years ago

Hi @rdinnager. As it happens, there is a CS student at Cornell who is used to working on Windows and is interested in finding some small contribution she could make to SLiM. Are there any remaining loose ends that were beyond your programming knowledge to address? If so, maybe she could take them on. But IIRC you figured out pretty much everything, right?

petrelharp commented 2 years ago

Where's the PR for this? I can take a look at the failing tskit tests.

bhaller commented 2 years ago

Where's the PR for this? I can take a look at the failing tskit tests.

It has been merged into master, since the failing tests seemed to be the only problem with it, and it seemed clear that they were failing because the new tskit changes (particularly removing the ordering requirement for the individuals table) hadn't been released in a new tskit. So now the tests on master fail; if there's anything that can be done about that before the next tskit gets released, that would be nice. :->

bhaller commented 2 years ago

Where's the PR for this? I can take a look at the failing tskit tests.

It has been merged into master, since the failing tests seemed to be the only problem with it, and it seemed clear that they were failing because the new tskit changes (particularly removing the ordering requirement for the individuals table) hadn't been released in a new tskit. So now the tests on master fail; if there's anything that can be done about that before the next tskit gets released, that would be nice. :->

Oh, and also tests failing because the Qt download server seemed to be down; that has probably now been fixed on their end.

rdinnager commented 2 years ago

Yes, I think I figured almost everything out. There is one or two things that still don't work, Eidos_GetMaxRSS() is currently disabled on Windows because I don't know how to get this information from the windows api. This means that memory tests are also disabled by default. It might be nice to get that working, for completeness. Also, I was never able to get the panel dividers to show up in SLiMgui, so that is another small thing that will require some a bit of trail-and-error fiddling.

rdinnager commented 2 years ago

Hey @bhaller, this is just a heads up that testing on master for the windows installation did not go well. The previous 'fix' I merged, which seemed to fix the my CLI build initially (locally), in fact broke all builds. I setup some initial windows tests on github actions, and they are currently all failing, and all for different reasons (https://github.com/rdinnager/SLiM/runs/4082356544). I have leads on what is wrong for most of them, but I will need a little time to sort this all out, so just letting you know in case you were considering making a new release soon.

bhaller commented 2 years ago

@rdinnager OK, thanks. So far no date for SLiM 3.7, I'm just surfing the waves of uncertainty. :-> I'll wait for a fix from you. Let me know if you need a hand with anything; thanks for following up.

rdinnager commented 2 years ago

So for tests I am running using the install from pacman, I am getting 7 failures in eidos similar to this:

strfind('foobarbazxyzzy', 'foo', pos=1); : FAILURE : mismatched values (4294967295), expected (-1)

They are all mismatches of 4294967295 with -1. My understanding is that 4294967295 is the unsigned int equivalent of -1, if I'm not mistaken? Do you have any idea why this might be happening on windows-latest on github actions? All tests still work fine on my local Windows 10 machine.

rdinnager commented 2 years ago

We are also getting three failures with functions of floats, when comparing with user defined functions:

// (numeric)cumProduct(numeric x)
function (numeric)cumProduct_func(numeric x)
{
    return sapply(seqAlong(x), 'product(x[0:applyValue]);');
}

for (iter in 1:100)
{
    x = sample(-100:100, 5, T); // integer
    xbuiltin = cumProduct(x);
    xuserdef = cumProduct_func(x);
    if (!identical(xbuiltin, xuserdef)) stop('Mismatch in test of cumProduct(i)');
}

for (iter in 1:100)
{
    x = rnorm(10);      // float
    xbuiltin = cumProduct(x);
    xuserdef = cumProduct_func(x);
    if (!identical(xbuiltin, xuserdef)) stop('Mismatch in test of cumProduct(f)');
}

return T;
 : FAILURE : raise during EvaluateInterpreterBlock(): ERROR (Eidos_ExecuteFunction_stop): stop("Mismatch in test of cumProduct(f)") called.

The same failure happens for cumSum and sum. It will be interesting to see if we get the same issues on the compiled version once I get that working.

bhaller commented 2 years ago

So for tests I am running using the install from pacman, I am getting 7 failures in eidos similar to this:

strfind('foobarbazxyzzy', 'foo', pos=1); : FAILURE : mismatched values (4294967295), expected (-1)

They are all mismatches of 4294967295 with -1. My understanding is that 4294967295 is the unsigned int equivalent of -1, if I'm not mistaken? Do you have any idea why this might be happening on windows-latest on github actions? All tests still work fine on my local Windows 10 machine.

This one was a bug; I have just committed a fix. No idea why it only manifested on Windows, but every compiler is different. In this case it seems like the incorrect behavior (i.e., manifesting the bug as a bug) is correct according to the C spec, so Windows seems right here and other platforms maybe seem wrong, but the C type-promotion rules are incredibly arcane, so it's hard to say for sure. :-> Anyway, fix committed but I can't test it on Windows, so please confirm.

bhaller commented 2 years ago

We are also getting three failures with functions of floats, when comparing with user defined functions:

...

The same failure happens for cumSum and sum. It will be interesting to see if we get the same issues on the compiled version once I get that working.

Huh. I'm not sure I understand; these failures indicate that the compiled version doesn't match the user-defined version, for some input. But then you write "will be interesting to see if we get the same issues on the compiled version once I get that working", which implies that you do not presently have the compiled version working. How are you comparing the compiled version to the user-defined version, then? Anyhow, I don't have a guess on this one so far. What I'd recommend is opening up an Eidos console in SLiMgui and trying out sum()/product()/cumSum()/cumProduct() with some random input, just as these comparison tests do, to see what is going on. Happy to zoom with you if that would be helpful; email me a zoom link if you want to.

rdinnager commented 2 years ago

We are also getting three failures with functions of floats, when comparing with user defined functions: ... The same failure happens for cumSum and sum. It will be interesting to see if we get the same issues on the compiled version once I get that working.

Huh. I'm not sure I understand; these failures indicate that the compiled version doesn't match the user-defined version, for some input. But then you write "will be interesting to see if we get the same issues on the compiled version once I get that working", which implies that you do not presently have the compiled version working. How are you comparing the compiled version to the user-defined version, then? Anyhow, I don't have a guess on this one so far. What I'd recommend is opening up an Eidos console in SLiMgui and trying out sum()/product()/cumSum()/cumProduct() with some random input, just as these comparison tests do, to see what is going on. Happy to zoom with you if that would be helpful; email me a zoom link if you want to.

Sorry about being unclear. These failures happen on the github actions tests for Windows, when testing the SLiM install from msys2 pacman. This installation is not compiled locally, it simply installs precompiled binaries from the msys2 package repository. I was just curious whether we will see the same when I get the tests where we compile SLiM from source to work. It looks like I fixed the problem and the compilation is running on github actions now. I should see soon what happens when I get to the tests part of the workflow.

rdinnager commented 2 years ago

While we are at it, I am getting a new warning during compilation on Windows:

D:/a/SLiM/SLiM/core/slim_functions.cpp: In function 'double DoSummarizeOperation(Individual*, SummarizeOperation)':
D:/a/SLiM/SLiM/core/slim_functions.cpp:1295:1: warning: control reaches end of non-void function [-Wreturn-type]
 1295 | }
      | ^

Thoughts?

rdinnager commented 2 years ago

We are also getting three failures with functions of floats, when comparing with user defined functions:

// (numeric)cumProduct(numeric x)
function (numeric)cumProduct_func(numeric x)
{
  return sapply(seqAlong(x), 'product(x[0:applyValue]);');
}

for (iter in 1:100)
{
  x = sample(-100:100, 5, T); // integer
  xbuiltin = cumProduct(x);
  xuserdef = cumProduct_func(x);
  if (!identical(xbuiltin, xuserdef)) stop('Mismatch in test of cumProduct(i)');
}

for (iter in 1:100)
{
  x = rnorm(10);      // float
  xbuiltin = cumProduct(x);
  xuserdef = cumProduct_func(x);
  if (!identical(xbuiltin, xuserdef)) stop('Mismatch in test of cumProduct(f)');
}

return T;
 : FAILURE : raise during EvaluateInterpreterBlock(): ERROR (Eidos_ExecuteFunction_stop): stop("Mismatch in test of cumProduct(f)") called.

Ah, I can now clarify that these failures happen on 32bit Windows only. I haven't really tested on 32bit Windows because I use 64bit Windows. One option is to just drop support for 32bit Windows (honestly I don't know how many people really still use it, and Microsoft will be abandoning it soon anyway). Otherwise I'm not sure what the consequences of this test failure is, and how to reproduce it without access to a 32bit machine I can play around with..

bhaller commented 2 years ago

Ah, I can now clarify that these failures happen on 32bit Windows only. I haven't really tested on 32bit Windows because I use 64bit Windows. One option is to just drop support for 32bit Windows (honestly I don't know how many people really still use it, and Microsoft will be abandoning it soon anyway). Otherwise I'm not sure what the consequences of this test failure is, and how to reproduce it without access to a 32bit machine I can play around with..

Aha. I would definitely drop support for 32-bit Windows. I have just added a compile-time check that generates an error message if a compile to 32-bit is attempted (on any platform, not just on Windows); nobody should be doing that. :->

rdinnager commented 2 years ago

Okay, all tests Windows tests are working except for the two that run the Eidos tests on 32bit Windows. Even after incorporating your latest changes I still get two -1 mismatches:

strfind(c('foobarbazxyzzy', 'barfoo', 'xyzzyfoobaz', 'xyzzy'), 'foo'); : FAILURE : mismatched values (0 3 5 4294967295), expected (0 3 5 -1)
strfind(c('foobarbazxyzzy', 'barfoo', 'xyzzyfoobaz', 'xyzzy'), 'foo', pos=1); : FAILURE : mismatched values (4294967295 3 5 4294967295), expected (-1 3 5 -1)

But that is down from 6 before, so that is good.

rdinnager commented 2 years ago

Ah, I can now clarify that these failures happen on 32bit Windows only. I haven't really tested on 32bit Windows because I use 64bit Windows. One option is to just drop support for 32bit Windows (honestly I don't know how many people really still use it, and Microsoft will be abandoning it soon anyway). Otherwise I'm not sure what the consequences of this test failure is, and how to reproduce it without access to a 32bit machine I can play around with..

Aha. I would definitely drop support for 32-bit Windows. I have just added a compile-time check that generates an error message if a compile to 32-bit is attempted (on any platform, not just on Windows); nobody should be doing that. :->

Okay, that is good to get clarification on that. I was not sure because I never use 32bit. I'm not surprised that it is wrong to do so. I should probably remove that build from pacman as well..

bhaller commented 2 years ago

While we are at it, I am getting a new warning during compilation on Windows:

D:/a/SLiM/SLiM/core/slim_functions.cpp: In function 'double DoSummarizeOperation(Individual*, SummarizeOperation)':
D:/a/SLiM/SLiM/core/slim_functions.cpp:1295:1: warning: control reaches end of non-void function [-Wreturn-type]
 1295 | }
      | ^

Thoughts?

This is new code, recently pushed. The warning is spurious; there's no problem. I've just pushed a fix that should get rid of the warning.

bhaller commented 2 years ago

...I should probably remove that build from pacman as well..

Definitely. Thanks.

bhaller commented 2 years ago

...Even after incorporating your latest changes I still get two -1 mismatches:

Yes, that's because my previous fix was wrong, dumb error. It's a pain when one can't test code for correctness locally! Fix for the fix now pushed.

bhaller commented 2 years ago

OK, with your new PR I guess all the above is resolved, right? Just need to (a) disable 32-bit in the pacman stuff, (b) add tskit tests for windows once they are working again, and (c) do the documentation work.

bhaller commented 2 years ago

@rdinnager I've just tagged you on https://github.com/MesserLab/SLiM/issues/232 regarding the tskit tests. I think everything else about the Windows port is now resolved (although maybe you'll have a few edits for me on the documentation write-up; that doesn't happen on GitHub anyway). So, I'm going to close this issue now! Yay! Of course feel free to create new issues if there are any remaining loose ends to be tracked. Thanks for all your work on this!