How to run the code in README.md

dsgoll123 commented 3 years ago

Current: The description of task in MLacc is incomplete or unclear. Target: A description which is allows the users to run and adapt the tool to their intended usage.

YilongWang commented 3 years ago

Task 1: run with different K values and generate the curve of disctance as a function of K Task 2: run with the K that is decided to do the clustering and Machine learning

Now, the code can be run with qsub for all the tasks with just one submission.

obfiscator commented 3 years ago

I'm attempting to run the tool with TRUNK 4.0. Not sure if the answers will be good, but I would like to see how it works technically, and if modifications need to be made. I am treating it like a completely new user, and working off the README.

As a new user, I understand that I need simulations to train on. NOTE: I don't see the simulations required specified in the README (would be good to add). I believe that a global simulation of 200 years, 2-degree resolution, without an analytical spinup, should be sufficient. I have run that on Irene and have it ready to copy to obelix (history files and restart files, whatever is needed).

Step 1 worked fine, I have the latest version of the code now on obelix.

Step 2, easy to change the dirpython. On the other hand, I have no idea what dirdef I should be using. Perhaps a short list of the possibilities along with an explanation would be useful? I will attempt to create my own. "DEF_Trunk4.0/", copying from DEF_Trunk. Onto Step 3.

Step 3. I open up the MLacc.def file. I propose that the users change the logfile (if a new DEF directory was created, like I did) and execution directory, at the least (I also change task to be just 1 to test). Perhaps a precision on what, exactly, the execution directory is (my first thought is that it should be the same as the dirpython, but I don't think this is true). It seems like the directory where the results will be placed?

For the varlist.json file: I am missing precision on what these files are. I believe we can add a field for each variable, right? Perhaps, "description", and then include one or two sentences about what that is? And what kind of file it's typically taken from, e.g., "This value is normally found in the ORCHIDEE restart file."?

I did something naive and just replaced the existing files for the PFTmask and the pred with the first 10 years of the simulation that I had. I also replaced the variable name "var1" LAI with LAI_MEAN_GS. I replaced VEGET_COV_MAX by VEGET_MAX. I replaced the two restart files with the sechiba and stomate restart files from the last decade of my simulation (i.e., year 191-200). I had no problem with the climate variables, since I had run my simulation with CRUJRA at 2.0 degree resolution, which was already in the varlist file.

Step 4: It runs, but I cannot find the results. Must the execution directory in MLacc.def exist? I believe so. Perhaps a line of code that checks for this and outputs an error message? Redoing, I still cannot find results. I do find a file "o", though, which seems to have an error message in it. Perhaps "o" can be renamed to something like, "output.err" so that it's more clear?

Traceback (most recent call last): File "main.py", line 83, in dis_all=Cluster.Cluster_test(packdata,auxil,varlist,logfile) File "/home/orchidee03/mmcgrath/TEST_SPINacc/SPINacc/Tools/Cluster.py", line 96, in Cluster_test ClusD,disx,traID=Cluster_Ana(packdata,auxil,PFT_mask,veg,var_pred,var_pred_name,kkk,10) File "/home/orchidee03/mmcgrath/TEST_SPINacc/SPINacc/Tools/Cluster.py", line 36, in Cluster_Ana pp[laix<0.01]=np.nan IndexError: too many indices for array

One thing that's not clear to me. It seems that, because I took a file covering 10 years for the var1 source file, the tool is looking for trends based on this first 10 years. If I use the whole 200 year simulation that I have (combining the 20 files into a single file), the results will clearly be better, correct? But, in that case, what happens with the meteo forcing? Does it automatically loop every 10 years, as is done in ORCHIDEE? Or do I need to input 200 years of meteo forcing data?

obfiscator commented 3 years ago

I see that, while I was working this morning, Yan made changes to the README. My comments were from the old file.

dsgoll123 commented 3 years ago

Hi Matt

Thanks for being so pro-active ! I updated the README just now and hope things are bit more clear. As I haven't started yet to adapt to another model version the info might still be incomplete. More below

On Thu, 15 Apr 2021 at 11:00, obfiscator @.***> wrote:

I'm attempting to run the tool with TRUNK 4.0. Not sure if the answers will be good, but I would like to see how it works technically, and if modifications need to be made. I am treating it like a completely new user, and working off the README.

As a new user, I understand that I need simulations to train on. NOTE: I don't see the simulations required specified in the README (would be good to add). I believe that a global simulation of 200 years, 2-degree resolution, without an analytical spinup, should be sufficient. I have run that on Irene and have it ready to copy to obelix (history files and restart files, whatever is needed).

Added to README

Step 1 worked fine, I have the latest version of the code now on obelix.

Step 2, easy to change the dirpython. On the other hand, I have no idea what dirdef I should be using. Perhaps a short list of the possibilities along with an explanation would be useful? I will attempt to create my own. "DEF_Trunk4.0/", copying from DEF_Trunk. Onto Step 3.

added to README, you need to make a new one for trunk4.

Step 3. I open up the MLacc.def file. I propose that the users change the logfile (if a new DEF directory was created, like I did) and execution directory, at the least (I also change task to be just 1 to test). Perhaps a precision on what, exactly, the execution directory is (my first thought is that it should be the same as the dirpython, but I don't think this is true). It seems like the directory where the results will be placed?

I chose a different exec dir to have the results somewhere else ; it's up to you.

For the varlist.json file: I am missing precision on what these files are. I believe we can add a field for each variable, right? Perhaps, "description", and then include one or two sentences about what that is? And what kind of file it's typically taken from, e.g., "This value is normally found in the ORCHIDEE restart file."?

The readme was extended by Yan, I will extend/modify if I see things are missing.

I did something naive and just replaced the existing files for the PFTmask and the pred with the first 10 years of the simulation that I had. I also replaced the variable name "var1" LAI with LAI_MEAN_GS. I replaced VEGET_COV_MAX by VEGET_MAX. I replaced the two restart files with the sechiba and stomate restart files from the last decade of my simulation (i.e., year 191-200). I had no problem with the climate variables, since I had run my simulation with CRUJRA at 2.0 degree resolution, which was already in the varlist file.

Step 4: It runs, but I cannot find the results. Must the execution directory in MLacc.def exist? I believe so. Perhaps a line of code that checks for this and outputs an error message? Redoing, I still cannot find results. I do find a file "o", though, which seems to have an error message in it. Perhaps "o" can be renamed to something like, "output.err" so that it's more clear?

The error checking is a stub yet - I don't know what the problem is. I start adapted the tool for CNP-MIMICS today, maybe I know soon more ...

Traceback (most recent call last): File "main.py", line 83, in dis_all=Cluster.Cluster_test(packdata,auxil,varlist,logfile) File "/home/orchidee03/mmcgrath/TEST_SPINacc/SPINacc/Tools/Cluster.py", line 96, in Cluster_test

ClusD,disx,traID=Cluster_Ana(packdata,auxil,PFT_mask,veg,var_pred,var_pred_name,kkk,10) File "/home/orchidee03/mmcgrath/TEST_SPINacc/SPINacc/Tools/Cluster.py", line 36, in Cluster_Ana pp[laix<0.01]=np.nan IndexError: too many indices for array

One thing that's not clear to me. It seems that, because I took a file covering 10 years for the var1 source file, the tool is looking for trends based on this first 10 years. If I use the whole 200 year simulation that I have (combining the 20 files into a single file), the results will clearly be better, correct?

It should not take the whole 200yr, but compute the trend over a period corresponding to the climate forcing period (e.g. 10yrs for CNP trunk2 runs).

But, in that case, what happens with the meteo forcing? Does it automatically loop every 10 years, as is done in ORCHIDEE? Or do I need to input 200 years of meteo forcing data?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/dsgoll123/SPINacc/issues/2#issuecomment-820254157, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATCGY3VZCKKLTI6SMD5Z563TI2TK3ANCNFSM4YRIMGZA .

-- LSCE / UPSaclay / CLAND Orme des Merisiers, 91191 Gif sur Yvette, France +33 169 08 02 36

dsgoll123 commented 11 months ago

The README is being revised from scratch following the code quality review. The current documentation will be moved to another file.

CALIPSO-project / SPINacc

How to run the code in README.md #2

ClusD,disx,traID=Cluster_Ana(packdata,auxil,PFT_mask,veg,var_pred,var_pred_name,kkk,10) File "/home/orchidee03/mmcgrath/TEST_SPINacc/SPINacc/Tools/Cluster.py", line 36, in Cluster_Ana pp[laix<0.01]=np.nan IndexError: too many indices for array