euroargodev / BlueCloud

Working space for BlueCloud demontrator 3 - Lops Task
0 stars 1 forks source link

BIC plot #31

Closed AndreaGarciaJuan closed 3 years ago

AndreaGarciaJuan commented 4 years ago

Plot BIC in Develop_PCM_model notebook

AndreaGarciaJuan commented 4 years ago

BIC plot using:

Capture d'écran de 2020-10-06 15-37-53

I do not understand why the curve is not going up for big number of classes. Do you have any idea?

gmaze commented 4 years ago

Nb of independent samples too large ?

gmaze commented 4 years ago

@AndreaGarciaJuan also, here is a code snippet to run a parallel computation:

import concurrent.futures
from tqdm import tqdm

LIST_OF_ARGS = np.arange(0,12) # this is the list of arguments to iterate over, for instance nb of classes for a PCM

def do_this(this_arg):
    """ Function to run on a single argument """
    # This is where you would run a BIC computation, given a nb of classes
    return this_arg**2 # dummy computation for the example

results = []
ConcurrentExecutor = concurrent.futures.ThreadPoolExecutor(max_workers=100)
with ConcurrentExecutor as executor:
    future_to_url = {executor.submit(do_this, arg): arg for arg in LIST_OF_ARGS}
    futures = concurrent.futures.as_completed(future_to_url)
    futures = tqdm(futures, total=len(LIST_OF_ARGS))
    for future in futures:
        traj = None
        try:
            traj = future.result()
        except Exception as e:
            pass
        finally:
            results.append(traj)
results = [r for r in results if r is not None]  # Only keep non-empty results             
gmaze commented 4 years ago

If you want to pass a dataset (without modifying it) to the function, you can do it this way:

import concurrent.futures
from tqdm import tqdm

SHARED_DATA = 12

LIST_OF_ARGS = np.arange(0,12) # this is the list of arguments to iterate over, for instance nb of classes for a PCM

def do_this(data, this_arg):
    """ Function to run on a single argument """
    # This is where you would run a BIC computation, given a nb of classes
    return data + this_arg**2 # dummy computation for the example

results = []
ConcurrentExecutor = concurrent.futures.ThreadPoolExecutor(max_workers=100)
with ConcurrentExecutor as executor:
    future_to_url = {executor.submit(do_this, SHARED_DATA, arg): arg for arg in LIST_OF_ARGS}
    futures = concurrent.futures.as_completed(future_to_url)
    futures = tqdm(futures, total=len(LIST_OF_ARGS))
    for future in futures:
        traj = None
        try:
            traj = future.result()
        except Exception as e:
            pass
        finally:
            results.append(traj)
results = [r for r in results if r is not None]  # Only keep non-empty results  
AndreaGarciaJuan commented 4 years ago

ok! thank you very much, I will try it tomorrow

AndreaGarciaJuan commented 3 years ago

Paralelisation works well using multi-thread, also in the VRE. Now I am working on including time correlation input when selecting the sub-dataset and I will create a plot_BIC function in Plotter class to obtain a clean development notebook.

AndreaGarciaJuan commented 3 years ago

An example of the plot: Capture d'écran de 2020-10-28 10-52-31

AndreaGarciaJuan commented 3 years ago

I am trying to include time correlation in the BIC calculation. For my dataset in the Mediterranean, when I use the month of December, I should use a spatial correlation of 40km to get a minimum (K=10). If I add another month (June, to choose a summer month) the curve do not show a minimum and I should increase the spatial correlation to 60km to get the fist minimum (k=10). Here you are the figure: Capture d'écran de 2020-11-05 10-19-57

I have tried to use aleatory months spaced of a given number of months but I have never found a minimum in the curve. I feel that adding another time step for BIC calculations increase enormously the correlation. I think the best idea (at least for the beta-version) is to ask the user to chose 2 time steps in the dataset, and tell him that if there is not a clear minimum he should choose other time steps or increase the spatial correlation. What do you think about it?

AndreaGarciaJuan commented 3 years ago

This problem was solved using a different grid selection for each dataset. The function use now time_steps as input, where the user can choose the time steps he wants to use for BIC calculation. If time steps are too near in time, a warning appears.

BIC, BIC_min = BIC_calculation(ds=ds, coords_dict=P.coords_dict, 
                               corr_dist=corr_dist, time_steps=time_steps, 
                               pcm_features=pcm_features, features_in_ds=features_in_ds, z_dim=z_dim, 
                               Nrun=Nrun, NK=NK)

Here un examlpe of BIC using time_steps = ['2018-01','2018-07'] BIC_EX

MInimun is k=12