Running Monte Carlo 4-variable code

mankleh commented 5 years ago

The 4-variable Monte Carlo (MC) is written and tested up to 150 iterations, but I don't think I'll have access to the HPC system at OSU to run it all the way through. @zmorrell-sand would you be able to run RM3_MCopt.m on your end? That script and the associated function file are located in the RM3_MCopt folder.

I have the MC optimizing for Pow/SA right now, but could be changed to pow/vol or a different objective easily. The code stores each case in the output.mat file and the best solution will be in the output_best.mat file.

zmorrell-sand commented 5 years ago

@mankleh Hello,

I pulled the most recent commit, and I can't see where any parallelizations in the RM3_MCopt.m script. I can still run it, but there won't be much of a speedup. Am I missing something?

Thanks, Zach

mankleh commented 5 years ago

Hi @zmorrell-sand, No it isn't parallelized. I looked into doing it, but each iteration relies on the previous so using parfor isn't really an option. I tried to run it over the weekend, but my computer accidentally went to sleep. Any speed up will be nice as it will take a long time regardless.

Once a different form of optimization is used, like a Genetic algorithm, or other heuristic methods parallelizations will be possible.

zmorrell-sand commented 5 years ago

@mankleh Alright, I'll put it on one of the machines and I'll check the progress of it tomorrow morning so we can get a safe estimate of how long it will take to run.

mankleh commented 5 years ago

Thanks @zmorrell-sand. I was going to take ~3 days to run on my computer, so it might be a similar time frame.

zmorrell-sand commented 5 years ago

@mankleh It seems that this will take ~3 days to run. It is at about 2181 iterations currently. It should be noted that we are currently not entirely sure of the validity of 'PS' control, see issue SNL-WaterPower/WecOptTool#13 . Either way, even if it needs to be re-run, it is good to know the expected time frame.

mankleh commented 5 years ago

@zmorrell-sand Good to know! I'm going to start formulating the 7-variable MC, but I think I might parallelize the code to run all three controls methods so we don't have to run them one at a time.

zmorrell-sand commented 5 years ago

@mankleh Hello,

Let me first apologize for the lengthy comment, and implore you to read all of this.

It looks like the run somehow failed after 9323 RM3_getPow evaluations this morning around 7:00. I am not entirely sure why it stopped running, but I did notice a few things in the run directory that seem a little bit concerning especially for future parallelization efforts. If I had to immediately guess, I would say that Sandia might have had issues with me checking out the same MatLab licence for a few days and pulled my licence partway through the run. This makes a lot of sense to me, since the last files were written exactly at 7:00 A.M. - when a lot of people get to work. If this is the case, it might be better to run on your computer, as there are Windows system settings that allow the machine to not fall asleep and you won't have to worry about licence issues. Also, I might have a few ways to greatly speed up the runtime of the code.

First, the slightly concerning thing for parallelization:

I notice that in the run directory, a lot of the files have the same suffix, which is supposed to be randomly generated. This implies to me that the random number generator is being reset somewhere and needs to be reshuffled. I will look more into this. Example file names with this issue: RM3_190802_0627_114745973, RM3_190802_0628_114745973, RM3_190802_0646_114745973, and far more like them.

Second, ways to greatly speed up your runtime:

The vast majority of the execution time for this program is coming from Nemoh's generation of the mesh. This execution time is so significant, that all other calculations are having negligible effects on runtime. Everytime the RM3_getPow function is called, Nemoh is generating a new mesh, unless it is specified to read the mesh from a file. This means that you want to minimize your calls to RM3_getPow as much as possible.

I notice that at lines 33 and 39, and more importantly at lines 55 and 57, which are inside of the for loop in RM3_MCopt.m, you are calling RM3_getPow with the same parameters twice: once to calculate power relative to surface area, and once to calculate with volume. I don't see any Matlab optimization functions that require you to do this this way, so the following is my suggested way to write lines 55 through 58 effectively as follows, and do something similar for lines 33 and 39:

pow = RM3_getPow(S, 'PS', 'parametric', xx_new);
vol = RM3_getVol(xx_new);
surfArea = RM3_getSurfArea(xx_new);

new_obj = pow/surfArea;
output.pow_SA(k+1) = new_obj;
output.pow_v(k+1) = pow/vol;
output.dim(k+1,:) = xx_new(:);

This should cut your runtime in half, although I recommend of course testing this with a smaller iteration value before taking my word for it.

tl;dr: program failed, probably due to licences; there is something fishy in the Nemoh runs directory with randomized file suffixes - could cause problems with parallelization; you can probably halve your runtime for the Monty Carlo by being careful when you call RM3_getPow.

If you have any questions feel free to either respond over GitHub or set up a time for another phone meeting. Also, I will be taking the week of the 12th through the 16th off to rest my mind before school starts again.

Thank you and sorry again for the wall of text, Zach

mankleh commented 5 years ago

Hi @zmorrell-sand,

Thanks for all the feedback and information on your end. I had talked to Ryan last Tuesday about the ways to shorten up the code, which I've implemented on my end (I'll push through those changes as well). I ended up running the 'CC' control code after learning about the unsure validity of the 'PS' case.

As for the the random numbers I has set the random generate to 'shuffle' when I was testing out the code - so that might be where the issue is. I've taken that setting out, so hopefully that fixes things.

zmorrell-sand commented 5 years ago

Hi @mankleh

I'm looking through the Monte Carlo stuff, and I have a few more questions. I haven't seen the new code for the MCoptim.m yet, so I don't have any more to say on that. My questions lie with the the varGen function. It looks to me like you are trying to generate uniformly random numbers to three decimal places offset from x_o = [10, 15, 2, 42] by +- c = [5, 7.5, 1, 21] . It looks like the method that you are using could introduce some loss of uniformity due to drawing from numbers of the same scale in the randi([0,10])/1000 step as in the randi([0,10])/100 step. effectively, there will be two ways to make every number with a 0 in the thousandths place (except for the boundaries). For example, in the first if statement, lets say you drew x1 = 1, x2 = .2, x3 = .03, x4 = 0.000, you would get X = 1.230. Similarly, if you drew x1 = 1, x2 = .2, x3 = .02, x4 = .010, you would also get X = 1.230

For the same functionality with no loss of uniformity, I recommend writing the function this way:

rng shuffle
c = [5, 7.5, 1, 21];
x_o = [10, 15, 2, 42];

lower = -c(ii);
upper = c(ii);

X = randi([lower*1000, upper*1000])/1000;
xx = x_o(ii) + X;

or, if you are concerned that maybe there will be some floating point roundoff error that will give you issues for 7.5, you can pre-multiply in a 10:

rng shuffle
c = [50, 75, 10, 210]; % pre-multiplying in a 10 to represent 7.5 as int instead of float
x_o = [10, 15, 2, 42];

lower = -c(ii);
upper = c(ii);

X = randi([lower * 100, upper *100])/1000; % 100 inside brackets due to pre-multiplication
xx = x_o(ii) + X;

zmorrell-sand commented 5 years ago

Hi @mankleh

If at all possible, could we schedule a Skype call this week? I have a few questions about how the Monte Carlo Hill Climbing is intended to work. It would work to do up until Thursday, as I have Friday of this week off.

I hope all is going well, not trying to add any undue stress.

Thanks, Zach

mankleh commented 5 years ago

Hi @zmorrell-sand,
Yeah of course - what times works for you this week? I can work around your schedule. As for the code efficiency, I am all up to suggestions to make it more efficient! I was just trying to increase the randomness of the Monte Carlo by setting up the varGen.

I've also pushed the updates I made to RM3_MCopt.m and included the results of the 'CC' and 'P' runs from the code.

zmorrell-sand commented 5 years ago

Hi @mankleh , Would tomorrow around 11 A.M work? Otherwise, I can do any time from 2 to 5 tomorrow. I'll go ahead and pull the updated code and familiarize myself with it.

Thanks, Zach

mankleh commented 5 years ago

@zmorrell-sand - Yep that works! Just to clarify it would be 11 AM MT time?

zmorrell-sand commented 5 years ago

@mankleh , Yes, 11:00 A.M. Mountain Time. Sorry, forgot about the time zone difference.

ryancoe commented 5 years ago

@mankleh - I was looking through your updated code and had two questions:

You currently generate each set of design variables sequentially. Why not generate, e.g., a 1e6x4 random matrix (with proper bounds) and then evaluate the objective function(s) for each row? This would allow better parallelization.
Why are you using randi instead of rand?

Also, the pseudospectral implementation has been fixed (8af979e80a51044eeb95aa09a18173a82fc9e247), so you can add this to your runs.

ryancoe commented 5 years ago

@mankleh - Just wondering how you're coming along with this? We're here to help if you need us. It would be great to see the update results (after 8af979e) for the P, PS, and CC controllers next Tuesday, if possible.

BryonyDuPont commented 5 years ago

Ok, @mankleh is on tap to do the following: 1) revise the MC code to remove the hill climbing, given some unexplainable results we just noted in the version that does include hill climbing 2) initiate a matrix of random solutions for the four design variables, such that the code can be parallelized more efficiently. The evaluation step can instead pull from this matrix. 3) push multiple instances of the code with different objective functions so we can run it at SNL without Hannah (she is in Italy starting on Saturday).

Hope this makes sense. Holler if it doesn't

mankleh commented 5 years ago

@ryancoe - I finished updated the code, and pushed it through on GitHub and should be ready to run on the servers.

Currently the only part of the code that needs to change is the controller type on line 14.

I have the code outputting the best solutions for three different objective functions: power, power/volume, and power/surface area. If you want to only evaluate one objective, the others can be commented out. Everything is saved to a .mat file once the run is complete.

ryancoe commented 5 years ago

@mankleh - This is still only for a single sea state, is that intentional?

@zmorrell-sand - Can you take a look at this today?

zmorrell-sand commented 5 years ago

@ryancoe Just got done looking at it. It looks like a very good starting place. It should run in parallel which is nice. It also looks more simple than the previous version which is good for initial debugging purposes. Hopefully those parallelism issues I encountered at the beginning of the summer don't rear their head, or I may have to do some debugging :) .

@mankleh and @ryancoe , I was thinking it might be a good idea to put in some form of checkpointing. That is, we could write to a file with a randomized name every 100 or so iterations to ensure that if something goes wrong, we can start partway through. It could be as simple as opening a opening a file to write to, writing the iteration number, the current guess, and the best solution found. I could probably find a good place to throw this in and do it myself if you are already out of the country. This just seems like a reasonable way to make sure that we don't waste a few days of run time if the Matlab licences act up again. As long as the paralellization works, this hopefully won't be an issue, but it would still be rather easy and could save some time.

Edit: At the very least, I will run this tonight without making any modifications so that we can know if it works by Wednesday.

mankleh commented 5 years ago

@ryancoe I had just copied over the few lines for the bretschneider from the RM3_debug.m and RM3_optimDebug.m and just didn't update the sea states after that so it can be changed to represent a full range of sea states.

ryancoe commented 5 years ago

@zmorrell-sand - Some responses below:

Checkpointing: yes, I was thinking the same thing; you can check, but I would guess the time to append a .mat file may be negligible and thus we do this every iteration (not sure how that would work within a parfor loop)
Issues with parallelism: if I recall correctly, you were having issues with running on the CEE-COMPUTE machines? if so, you may need to use nohup or screen.

@mankleh - Ok, have you and/or @Amerlon finished incorporating the k-means subset of sea states?

ryancoe commented 5 years ago

Also, I finished running the CC case w/ 1e3 iterations: output_CC_190826_1820.mat.zip

fn = 'output_CC_190826_1820.mat';
load(fn)

%%

nn = [-1, 1, 1, -1, -1];

fs = fields(output);
for ii = 1:length(fs)-1
    figure
    x = output.dim(:,1); % Float radius
    y = output.dim(:,2); % Spar length
    z = nn(ii) * output.(fs{ii+1});
    scatter3(x,y,z,[],z)
    hold on
    grid on
    view(gca,[99.3 22.8]);
    xlabel('Float radius [m]')
    ylabel('Spar length [m]')
    zlabel(fs{ii+1},'interpreter','none')
    title(fs{ii+1},'interpreter','none')
    export_fig([fn,'_',fs{ii+1},'.png'])
end

output_CC_190826_1820 mat_pow_v output_CC_190826_1820 mat_v output_CC_190826_1820 mat_SA

zmorrell-sand commented 5 years ago

@ryancoe and @mankleh Good news, the program ran without any issues with parallelism. It took about 12.5 hours to run 10,000 iterations, which seems like a reasonable runtime to me for almost any optimization algorithm you would want to throw at it.

My main concern with appending the .mat file was the parallelism. I will do some looking into this to see if it is parallel safe, but I would expect it to introduce some issues. I will follow up this comment with the results of my googling.

I have also run the plotting script that you provided, Ryan, but it is so similar to your results that I think it would be redundant to post the figures here. I will hold onto the .mat file and remove all of the run files, since they seem to severely slow down Matlab (I think it has to recursively check all of the directories to make sure that you aren't calling something in one of them). As a substitute, I will include the console output of the best performances with regard to each objective function (10,000 iterations, CC).

I was wondering if it would make sense to generate graphs similar to the parabolic graph in SNL-WaterPower/WecOptTool#37 to check the apparent smoothness of the function now that those fixes have been added? If the functions appear largely smooth, we would be able to use a gradient based algorithm with a decent degree of confidence, which would make writing the optimization algorithm easier. I would note that the program seemed to have some difficulties with the built-in Matlab optimization when I last tried it, so I think it still makes sense for Hannah to write the optimization algorithms rather than the built in ones, regardless of the smoothness.

While I await a response, I will run 10,000 iterations of PS control so that we can get some representative data on all three control types.

Solution for power as obj function
ans =

  1x6 table

                  dim                       pow          SA        v       pow_SA      pow_v 
    ________________________________    ___________    ______    ______    _______    _______

    13.99    21.29     2.48    62.14    -5.2679e+06    2390.6    2948.9    -2203.6    -1786.4

Solution for pow/SA as obj function
ans =

  1x6 table

                  dim                      pow          SA        v       pow_SA     pow_v
    _______________________________    ___________    ______    ______    _______    _____

    5.08     8.62     2.56    59.03    -4.0403e+06    450.38    440.98    -8970.8    -9162

Solution for pow/vol as obj function
ans =

  1x6 table

                  dim                      pow          SA        v       pow_SA     pow_v 
    _______________________________    ___________    ______    ______    _______    ______

    5.36     7.88     1.01    61.48    -3.1923e+06    368.86    286.23    -8654.6    -11153

Elapsed time is 44801.596880 seconds.

zmorrell-sand commented 5 years ago

@ryancoe almost everything I found about trying to work on the same .mat file in a parfor said that it would likely cause issues and was bad practice. What I could do though, would be to add to the file in increments of 100, starting 100 behind the current iteration. Since the most parallel threads that will be running at once is 12 with the parpool I am running, doing 100 prior seems like a safe range to make sure that there isn't a slower process that would interfere with it. That being said, this could still have some issues, but it seems like a good middle ground that would be very convenient if it worked.

for example, something similar to this (it may require some debugging before I'd trust it for sure, so I would implement it on Friday so I could have more consecutive time to work on it):

parfor i = 1:1:iter
    ...
    if i >= 200 && mod(i,100) == 0
        save(fname1, output.dim(i - 199: i-100), ... , '-append');
    end
    ...
end
save(fname1, output.dim(i - 199: ), ... , '-append');

zmorrell-sand commented 5 years ago

@ryancoe It seems that the parallelism issues are exclusively for 'PS' control. I was able to run 'CC' just fine, tried 'PS', and it crashed. I ran a good amount of 'CC' again to make sure that it would still work. I am now running 'P' control type instead to see if the issue is contained to 'PS'.

When you have a chance, can you try running 'PS' in parallel to verify that it is not just an issue on my end? It throws a strange Fortran error for me.

ryancoe commented 5 years ago

@zmorrell-sand - Responding to a couple of your points:

Incremental .mat file saving: Thinking a bit harder, I have actually done this in the past. I used the solution recommended here (i.e., write a separate function to handle the saving): https://www.mathworks.com/matlabcentral/answers/135285-how-do-i-use-save-with-a-parfor-loop-using-parallel-computing-toolbox
parabolic graph: Yes, let's give this a shot
Write a custom optimization function: I'm not yet convinced this is necessary
Long MATLAB search path: Interesting... can you do some Googling and see if this makes sense? If so, we may want to add something to exclude the rundirs from the search path.
parallelism issues are exclusively for 'PS' control: This is surprising, especially when you say the error appears to be Fortran (which is Nemoh and the same for all the control strategies). I'll see if I can replicate your problem.

zmorrell-sand commented 5 years ago

@ryancoe

Incremental.mat file saving: I'm surprised I didn't find that psave function when I first searched. That seems much easier than my idea, so lets go ahead with that. I'll get it implemented before trying to run 'PS' again.
parabolic graph: I'll get a start on that and try to get it posted by the end of the day.
parallelism issues exclusive for 'PS' control: It ran just fine for 'P' and 'CC' and only crashed for me with 'PS'. I will will try re-running it with 'PS' to see if it was a fluke.
results of 'P' control type: The 'P' control type finished running, with a runtime of ~26.4 hours. I will include the console output in a subsequent post for ease of reading
Long MATLAB search path: It looks like the issue arises from the genpath. Genpath generates the path for all subdirectories so that you can add a whole project more easily. It is what we use in our startup.m, which is why matlab seems to stall on startup. I was able to find a workaround suggestion here, but it isn't built in.

zmorrell-sand commented 5 years ago

'P' control type console output:

Solution for power as obj function
ans =

  1x6 table

                  dim                      pow          SA        v       pow_SA      pow_v 
    ________________________________    __________    ______    ______    _______    _______

    14.82    17.92     1.13    60.53    -3.095e+06    1916.7    1788.5    -1614.8    -1730.5

Solution for pow/SA as obj function
ans =

  1x6 table

                  dim                      pow          SA        v       pow_SA      pow_v 
    _______________________________    ___________    ______    ______    _______    _______

    5.36     7.88     1.01    61.48    -1.3591e+06    368.86    286.23    -3684.5    -4748.1

Solution for pow/vol as obj function
ans =

  1x6 table

                  dim                      pow          SA        v       pow_SA      pow_v 
    _______________________________    ___________    ______    ______    _______    _______

    5.36     7.88     1.01    61.48    -1.3591e+06    368.86    286.23    -3684.5    -4748.1

Elapsed time is 95109.588719 seconds.

'P' control type .mat file: output_P_190829_2007.zip

'P' control type graphs: output_P_190829_2007 mat_pow_v output_P_190829_2007 mat_pow_SA output_P_190829_2007 mat_v output_P_190829_2007 mat_SA

zmorrell-sand commented 5 years ago

@ryancoe Parabolic graphs These look a lot smoother than the previous ones. To generate the file, I made a copy of the RM3_MCopt.m and replaced the variable generation with the following:

% Populating design variables
output.dim(:,1) = linspace(5,12.5,200);
output.dim = output.dim + [0, 8.62, 2.56, 59.03];

I plotted with this: output_CC_190830_1113.zip

fn = 'output_CC_190830_1113.mat';
load(fn)

%%

nn = [-1, 1, 1, -1, -1];

fs = fields(output);
for ii = 1:length(fs)-1
    figure
    x = output.dim(:,1); % Float radius
    %y = output.dim(:,2); % Spar length
    z = nn(ii) * output.(fs{ii+1});
    scatter(x,z)
    hold on
    grid on
    %view(gca,[99.3 22.8]);
    xlabel('Float radius [m]')
    %ylabel('Spar length [m]')
    ylabel(fs{ii+1},'interpreter','none')
    title(fs{ii+1},'interpreter','none')
    export_fig([fn,'_',fs{ii+1},'.png'])
end

output_CC_190830_1113 mat_pow_SA output_CC_190830_1113 mat_SA output_CC_190830_1113 mat_v

zmorrell-sand commented 5 years ago

@ryancoe, I plotted the results of the 'PS' run, and there were a few very large outliers that threw off the whole plot. The outliers were on the order of 10^9, while an arbitrary point chosen that did not appear to be an outlier was on the scale of 10^6. I will refrain from uploading the other plots since they are meaningless until the outliers are dealt with.

ryancoe commented 5 years ago

@zmorrell-sand - These cases likely did not converge. Can we filter these out? Go ahead an post a .zip of the .mat if you can.

zmorrell-sand commented 5 years ago

@ryancoe Here is the zipped file for the 'PS' output_PS_190901_1635.zip

Filtering them out should be doable. I can have matlab take a standard deviation of the data set and exclude anything more than 4 standard deviations out.

zmorrell-sand commented 5 years ago

I just tested out the rmoutliers(A) command, and it seems to do the trick quite nicely. I tested it on both the output.pow from both the 'PS' case with obvious outliers and the 'P' case with no obvious outliers. It removed 25 outliers from the 'PS' case using 'mean' and 166 outliers with the default; and 0 outliers from the 'P' in both cases. I would suggest using default, as the mean didn't seem to get everything when plotted.

My new plotting script is as follows:

fn = 'output_PS_190901_1635.mat';
load(fn)

%finding outliers
[~, TF] = rmoutliers(output.pow); %gets indices of outliers
outInd = find(TF == 1); %array with indices of outliers
output.dim(outInd,:) = [];%removing outlier rows from output.dim
%%

nn = [-1, 1, 1, -1, -1];

fs = fields(output);
for ii = 1:length(fs)-1
    figure

    x = output.dim(:,1); % Float radius
    y = output.dim(:,2); % Spar length

    output.(fs{ii+1})(outInd) = []; %removing outlier rows.

    z = nn(ii) * output.(fs{ii+1});
    scatter3(x,y,z,[],z)
    hold on
    grid on
    view(gca,[99.3 22.8]);
    xlabel('Float radius [m]')
    ylabel('Spar length [m]')
    zlabel(fs{ii+1},'interpreter','none')
    title(fs{ii+1},'interpreter','none')
    export_fig([fn,'_',fs{ii+1},'.png'])
end

The following are the plots with outliers removed, using the above script: output_PS_190901_1635 mat_pow_v output_PS_190901_1635 mat_v output_PS_190901_1635 mat_SA

BryonyDuPont commented 4 years ago

@mankleh @ryancoe @zmorrell-sand @Amerlon Can we have a meeting about these results? I'm free Friday (20 SEP) from 10am-11am Pacific, if that happens to work for everyone. Ryan and I think it's critical to discuss/interpret these results to better understand if we have research contributions to report soon.

ryancoe commented 4 years ago

I’d prefer the afternoon, but if necessary I can make this work.

zmorrell-sand commented 4 years ago

I will be out of town on Friday, so I will be unable to make the call. If there are any questions that I can answer directly, I can make a response on GitHub hopefully on Fridoay, or by email when I get back on Monday.

SNL-WaterPower / WecOptTool-MATLAB

Running Monte Carlo 4-variable code #38