Sea state clustering - Githubissues

Amerlon commented 4 years ago

I believe both the Sorting_and_Clustering_v2.m and the Bretschneider_weapper.m are up and fully working and they need to be integrated into the rest of the optimization code, The code outputs 8 different sea states and I am unaware if the rest of the code can intake more then one.

mankleh commented 4 years ago

@Amerlon - Did you write the Bretschneider_wrapper.m? I don't see it anywhere in the repository, you may need to push it through.

BryonyDuPont commented 4 years ago

just to clarify, there is another thread about this that Ryan started (Issue SNL-WaterPower/WecOptTool#2 ).

We're not going to integrate the cluster code, really. We need the resulting cluster averages and fraction of occurrence for each cluster. If you can get those to @mankleh ASAP, that would be great.

Amerlon commented 4 years ago

@mankleh - sorry, I thought it had pushed it through, I just did it again and double checked online and it looks like it is there in the resource clustering folder.

ryancoe commented 4 years ago

We should pick this back up and attempt to implement a function that allows a user to:

Point to an NDBC buoy online
Pull the data from online (can leverage code from WDRT: http://wec-sim.github.io/WDRT/modules.html#WDRT.ESSC.Buoy.fetchFromWeb)
Use clustering to distill the local wave climate into n representative sea states with weightings for their occurrence (this step was done by an OSU student in the code below).

%Copy of code that is used with just the straigt file names from the
%downloaded things from https://www.ndbc.noaa.gov
%only difference from sorting_wave_data.m is the change in file naming
%conventions that are imported

% Created by Ian Williams

clear
clc
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% For use Download desired years from https://www.ndbc.noaa.gov from the
% location that youd like (i.e. Eureka California) and then place the .txt
% files into the DATA folder found in the Resourse Clustering Folder. then
% run the code and select number of clusters desired. It will output the
% cluster desity, spread, and centroids in 3 separate .csv files. 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

addpath('DATA');
txtpattern = fullfile('DATA','*.txt');
dinfo = dir(txtpattern);
data = [];
for k = 1 : length(dinfo)
    filename = fopen(dinfo(k).name);
    data1 = textscan(filename,'%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f','HeaderLines',2,'CollectOutput',1);
    data1 = data1{:};
    filename = fclose(filename);
    data = [data;data1];
end

%%%%%%%%%%%%%%%%%% Removing Error Data %%%%%%%%%%%%%

data = data(data(:,9)~=99,:);
data = data(data(:,11)~=99,:);
data = data(data(:,12)<=361,:);

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%%%%%%%Removing extra columns%%%%%%%%%%%

data(:,[6,7,8,10,13,14,15,16,17,18])=[];

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% Clustering Section %%%

%%% Remove time stamps and wave direction %%%
data(:,[1;2;3;4;5;8])=[];

%%%%%% Normalize data %%%%%%
max_min = [min(data,[],1);max(data,[],1)];
normalizeddata = bsxfun(@minus,data,max_min(1,:));
normalizeddata = bsxfun(@rdivide,normalizeddata,diff(max_min,1,1));

%%% remove outliers %%%
%isoutlier(x,'mean') returns true for any values outside of 3 standard
%deviations from the mean.
TF = isoutlier(normalizeddata,'mean');
normalizeddata = [normalizeddata, TF];
normalizeddata = normalizeddata(normalizeddata(:,3)~=1,:);
normalizeddata = normalizeddata(normalizeddata(:,4)~=1,:);
normalizeddata(:,[3,4])=[];
%%%%%

data_normalized = normalizeddata;
x=normalizeddata(:,1);
y=normalizeddata(:,2);

%%% K-means Clustering%%%
% How to code kmeans into matlab
% Web reference used https://www.mathworks.com/help/stats/kmeans.html#buefthh-3
% This K-means clusetering apparently uses the squared Euclidean distance
% metric and the k-means++ akgirithm for cluster center initalization by
% default. ( I have read over its section on this but I dont fully
% understand if/how this matters)

% I am unsure if the 3rd thing (spread) actually returns wanted spread of
% the clusters 
k =  8;
%input('How many cluseters? \-\ ');
[idx,centroid,spread] = kmeans([x,y],k);

%%%Getting values of the unnormalized values of the centroids%%%
wvht_factor = max_min(2,1)-max_min(1,1);
apd_factor = max_min(2,2)- max_min(1,2);
unnormalized_centroid_wvht = centroid(:,1).*wvht_factor;
unnormalized_centroid_apd = centroid(:,2).*apd_factor;
header = ["WVHT", "APD"];
unnormalized_centroids = [unnormalized_centroid_wvht,unnormalized_centroid_apd];

% plotting the data with clusters being different colours
data = cell(k,1);
figure
hold on
for i = 1:k
    data{i} = [x(idx==i),y(idx==i)] ;
    plot(x(idx==i),y(idx==i),'.') 
    plot(centroid(:,1),centroid(:,2),'kx','MarkerSize',10,'LineWidth',3)

    title ('Cluster and centroids of Data');
    xlabel('WVHT Normalized'), ylabel('APD Normalized');
end
%%%%%

%%% density of each cluster %%%
density = [];
cluster_values = [];
for n = 1:k
    value = sum(idx==n);
    total = sum(idx>=0);
    percentage = value/total;
    cluster_values = [cluster_values;value];
    density = [density;percentage];
end
%%% Average Spread %%%
average_spread = spread./cluster_values;

%%% Add headers %%%
average_spread_and_header=["Average Spread";average_spread];
density_with_header = ["Density";density];
unnormalized_centroids_and_header = [header;unnormalized_centroids];

%%%% print relevent data %%%
disp (density_with_header)
disp('Cluster centers')
disp(unnormalized_centroids_and_header)
disp(average_spread_and_header)

%%% Export relevent data %%%
addpath('OUTPUT');
csvwrite(strcat('OUTPUT\','density ','.csv'),density);
csvwrite(strcat('OUTPUT\','Centroids','.csv'),centroid);
csvwrite(strcat('OUTPUT\','Spread ','.csv'),average_spread);
csvwrite(strcat('OUTPUT\','Data','.csv'),data_normalized);

% add to code

ryancoe commented 3 years ago

@ssolson - Please move this to MHKit and tackle it there.

ssolson commented 3 years ago

Ryan I tried this last week but the repositories must be in the same organization to do this. From the GitHub Documentation:

You can only transfer issues between repositories owned by the same user or organization account.

In this case, MHKiT is an organization and SNL-Waterpower is another. I just got off the phone with Aubrey discussing using the modified IFROM method to calculate probabilities at points. I have created a Gaussian Mixture Model, and a k-means method of doing this. It is my plan to submit a draft PR to MHKiT this week and reference this issue in it. I will close this issue at that time.

ryancoe commented 3 years ago

Sounds good, perhaps post a link to your PR here and then close this issue.

ssolson commented 3 years ago

Added as example in https://github.com/MHKiT-Software/MHKiT-Python/pull/91

SNL-WaterPower / WecOptTool-MATLAB

Sea state clustering #40