FREVA-CLINT / freva

The Free Evaluation System Framework (FreVa)
Other
10 stars 3 forks source link

Jupyter improvements #149

Closed antarcticrainforest closed 11 months ago

antarcticrainforest commented 1 year ago

This is a big one!

It add a couple of things, that makes it potentially more easy to interact with freva in python environments:

@Karinon you can see here the json field in django at work, if you are interested.

eelucio commented 1 year ago

@antarcticrainforest , do you recommend me to pull locally and test this branch so I can do some insightfull overview?

antarcticrainforest commented 1 year ago

I am going to install it in the dev environment, I'll tell you when you can try it.

eelucio commented 11 months ago

I am checkign the prettified --doc in CLI and I foud that is unable to crrectly parse html href:

$ freva plugin animator --doc
Animator (v2022.7.15): Create animations (in gif or mp4 format) This tool creates plots of solr facets and an animation.    

...
│ projection              │ Set the global map projection. Note: this should the name of the cartopy projection method (e.g PlatteCarree for Cylindrical Projection). Pleas │
│                         │ refer to <a href="https://scitools.org.uk/cartopy/docs/latest/crs/projections.html"target=_blank>cartopy website</a> for details. (default:     │
│                         │ PlateCarree)  
...

in juoyter however it does it without any problem


overall I dfoun the error logs much more helpfhull to catch errors on


I did not ahve the time revise the job handling in the python module

eelucio commented 11 months ago

I try to override my configuration in a jupyter notebook that is running the freva-dev kernel:

import freva

hist = freva.history(plugin="psi",limit=1) # this corresponds to the last run of psi with freva-dev
config = hist[-1]['configuration']
with freva.config("/work/bm1159/XCES/freva/evaluation_system.conf"):
    res = freva.run_plugin("psi",**config, batchmode=True)
---
[16:03:15] ERROR    freva - ERROR - (1146, "Table 'frevadb.history_batch_settings' doesn't exist") -    [utils.py](file:///home/b/b380001/freva-dev/lib/python3.11/site-packages/freva/utils.py):[88](file:///home/b/b380001/freva-dev/lib/python3.11/site-packages/freva/utils.py#88)
                    decrease log level via `freva.logger.setLevel` for more information
---------------------------------------------------------------------------
ProgrammingError                          Traceback (most recent call last)
Cell In[14], line 2
      1 with freva.config("/work/bm1159/XCES/freva/evaluation_system.conf"):
----> 2     res = freva.run_plugin("psi",**config, batchmode=True)

File /home/b/b380001/freva-dev/lib/python3.11/site-packages/freva/utils.py:57, in handled_exception.<locals>.wrapper(*args, **kwargs)
     55     return func(*args, **kwargs)
     56 except BaseException as error:
---> 57     exception_handler(error)

File /home/b/b380001/freva-dev/lib/python3.11/site-packages/freva/utils.py:90, in exception_handler(exception, cli)
     88     logger.error(msg)
     89 if logger.level > logging.DEBUG:
---> 90     raise exception from None
     91 raise exception

File /home/b/b380001/freva-dev/lib/python3.11/site-packages/MySQLdb/connections.py:255, in Connection.query(self, query)
    253 if isinstance(query, bytearray):
    254     query = bytes(query)
--> 255 _mysql.connection.query(self, query)

ProgrammingError: (1146, "Table 'frevadb.history_batch_settings' doesn't exist")

---
print(rest.status)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[15], line 1
----> 1 print(rest.status)

NameError: name 'rest' is not defined
eelucio commented 11 months ago

I tired running again a plugin from an exported configuration (i.e., running a jupyter notebook with freva-dev kernel but running a plugin at xces)

with freva.config("/work/bm1159/XCES/freva/evaluation_system.conf"):
    res = freva.run_plugin("psi",**config, batchmode=True)

Scheduled job with history id: 3073
You can view the job's status with the command squeue
Your job's progress will be shown with the command
tail -f /work/bm1159/XCES/xces-work/share/slurm/psi/PSI-6745650.out

print(res.status)
scheduled
$ cat /work/bm1159/XCES/xces-work/share/slurm/psi/PSI-6745650.out
INFO:numexpr.utils:Note: detected 256 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
INFO:numexpr.utils:Note: NumExpr detected 256 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
INFO:numexpr.utils:NumExpr defaulting to 8 threads.
INFO:freva:Running psi as scheduled in history with ID 3073
Traceback (most recent call last):
  File "/home/b/b380001/freva/bin/freva-plugin", line 8, in <module>
    sys.exit(main())
  File "/home/b/b380001/freva/lib/python3.10/site-packages/freva/cli/plugin.py", line 177, in main
    cli.run_cmd(args, **cli.kwargs)
  File "/home/b/b380001/freva/lib/python3.10/site-packages/freva/cli/plugin.py", line 150, in run_cmd
    value, out = freva.run_plugin(tool_name or "", **tool_args)
  File "/home/b/b380001/freva/lib/python3.10/site-packages/freva/_plugin.py", line 293, in run_plugin
    out = pm.run_tool(
  File "/home/b/b380001/freva/lib/python3.10/site-packages/evaluation_system/api/plugin_manager.py", line 738, in run_tool
    load_scheduled_conf(plugin_name, scheduled_id, user),
  File "/home/b/b380001/freva/lib/python3.10/site-packages/evaluation_system/api/plugin_manager.py", line 1124, in load_scheduled_conf
    row = h[0]
  File "/home/b/b380001/freva/lib/python3.10/site-packages/django/db/models/query.py", line 318, in __getitem__
    return qs._result_cache[0]
IndexError: list index out of range

********************************************************************************
*                                                                              *
*  This is the automated job summary provided by DKRZ.                         *
*  If you encounter problems, need assistance or have any suggestion, please   *
*  write an email to                                                           *
*                                                                              *
*  --  support@dkrz.de --                                                      *
*                                                                              *
*                       We hope you enjoyed the DKRZ supercomputer LEVANTE ... *
*
* JobID            : 6745650
* JobName          : PSI                                               
* Account          : bm1159
* User             : k204229 (200279), bm0146 (1076)                   
* Partition        : compute
* QOS              : normal
* Nodelist         : l40037 (1)                                                
* Submit date      : 2023-09-01T18:24:12
* Start time       : 2023-09-01T18:25:48
* End time         : 2023-09-01T18:25:59
* Elapsed time     : 00:00:11 (Timelimit=08:00:00)                     
* Command          : /tmp/tmpqco3la86.sh
* WorkDir          : /home/k/k204229
*
* StepID | JobName      NodeHours    MaxRSS [Byte] (@task)
* ------------------------------------------------------------------------------
* batch  | batch           0.0031
* extern | extern          0.0031                2552K (0)
* ------------------------------------------------------------------------------

and there is no trace of the job at XCES' db

I run the same configuration at freva-dev in interactive and is working,

antarcticrainforest commented 11 months ago

I tired running again a plugin from an exported configuration (i.e., running a jupyter notebook with freva-dev kernel but running a plugin at xces)

with freva.config("/work/bm1159/XCES/freva/evaluation_system.conf"):
    res = freva.run_plugin("psi",**config, batchmode=True)

Scheduled job with history id: 3073
You can view the job's status with the command squeue
Your job's progress will be shown with the command
tail -f /work/bm1159/XCES/xces-work/share/slurm/psi/PSI-6745650.out

print(res.status)
scheduled
$ cat /work/bm1159/XCES/xces-work/share/slurm/psi/PSI-6745650.out
INFO:numexpr.utils:Note: detected 256 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
INFO:numexpr.utils:Note: NumExpr detected 256 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
INFO:numexpr.utils:NumExpr defaulting to 8 threads.
INFO:freva:Running psi as scheduled in history with ID 3073
Traceback (most recent call last):
  File "/home/b/b380001/freva/bin/freva-plugin", line 8, in <module>
    sys.exit(main())
  File "/home/b/b380001/freva/lib/python3.10/site-packages/freva/cli/plugin.py", line 177, in main
    cli.run_cmd(args, **cli.kwargs)
  File "/home/b/b380001/freva/lib/python3.10/site-packages/freva/cli/plugin.py", line 150, in run_cmd
    value, out = freva.run_plugin(tool_name or "", **tool_args)
  File "/home/b/b380001/freva/lib/python3.10/site-packages/freva/_plugin.py", line 293, in run_plugin
    out = pm.run_tool(
  File "/home/b/b380001/freva/lib/python3.10/site-packages/evaluation_system/api/plugin_manager.py", line 738, in run_tool
    load_scheduled_conf(plugin_name, scheduled_id, user),
  File "/home/b/b380001/freva/lib/python3.10/site-packages/evaluation_system/api/plugin_manager.py", line 1124, in load_scheduled_conf
    row = h[0]
  File "/home/b/b380001/freva/lib/python3.10/site-packages/django/db/models/query.py", line 318, in __getitem__
    return qs._result_cache[0]
IndexError: list index out of range

********************************************************************************
*                                                                              *
*  This is the automated job summary provided by DKRZ.                         *
*  If you encounter problems, need assistance or have any suggestion, please   *
*  write an email to                                                           *
*                                                                              *
*  --  support@dkrz.de --                                                      *
*                                                                              *
*                       We hope you enjoyed the DKRZ supercomputer LEVANTE ... *
*
* JobID            : 6745650
* JobName          : PSI                                               
* Account          : bm1159
* User             : k204229 (200279), bm0146 (1076)                   
* Partition        : compute
* QOS              : normal
* Nodelist         : l40037 (1)                                                
* Submit date      : 2023-09-01T18:24:12
* Start time       : 2023-09-01T18:25:48
* End time         : 2023-09-01T18:25:59
* Elapsed time     : 00:00:11 (Timelimit=08:00:00)                     
* Command          : /tmp/tmpqco3la86.sh
* WorkDir          : /home/k/k204229
*
* StepID | JobName      NodeHours    MaxRSS [Byte] (@task)
* ------------------------------------------------------------------------------
* batch  | batch           0.0031
* extern | extern          0.0031                2552K (0)
* ------------------------------------------------------------------------------

and there is no trace of the job at XCES' db

I run the same configuration at freva-dev in interactive and is working,

That is indeed a very interesting use case. There is something not working. I blame Django. We should, if at all use a different ORM.

So what happens is that you submit the job, but I guess django doesn't reload the db connection so it stays with freva-dev db. When you submit a batch job the job is set as scheduled in the freva-dev History but the actual batch job that runs is registering that has started in the xces db hence you now have two entries in the the dbs. Or maybe I am mistaken.

On the other hand I don't know what we should expect when you submit the plugin from within the xces context and want to access the result outside the xces context. That should not work. So the status "should" be unknown.

I will also try to explicitly reload djangos db settings when you enter and leave the context.

EDIT: I was trying see how we can reload the database, and it is apparently not foreseen in django. So either we drop the whole config overriding or we just leave it as it is with the caveat that all (existing) db connections will stay, which may lead to confutions.

eelucio commented 11 months ago

Thanks for the extensive explanation.

I saw that at least the access to the external (e.g. xces) solr is possible if we temporarily load the configuration. Would that be enough to keep this thing up?

we could also beta it: solr connection is working (i.e. we can search all the data within diff instances), but we are having troubles running plugins of other instances and we will try to resolve that.

I will now check if just importing the freva library and then running the load_config command (launched from a "neutral" kernel) can do the job of running plugins. In this case we are not reloading sthing, right?

antarcticrainforest commented 11 months ago

I am checkign the prettified --doc in CLI and I foud that is unable to crrectly parse html href:

$ freva plugin animator --doc
Animator (v2022.7.15): Create animations (in gif or mp4 format) This tool creates plots of solr facets and an animation.    

...
│ projection              │ Set the global map projection. Note: this should the name of the cartopy projection method (e.g PlatteCarree for Cylindrical Projection). Pleas │
│                         │ refer to <a href="https://scitools.org.uk/cartopy/docs/latest/crs/projections.html"target=_blank>cartopy website</a> for details. (default:     │
│                         │ PlateCarree)  
...

in juoyter however it does it without any problem

overall I dfoun the error logs much more helpfhull to catch errors on

I did not ahve the time revise the job handling in the python module

That won't be possible. The cli displays things as a text and text cannot be interpreted as html only the other way round.

antarcticrainforest commented 11 months ago

the new db table structure is not available in xces yet, it needs redeployment.

eelucio commented 11 months ago

I am checkign the prettified --doc in CLI and I foud that is unable to crrectly parse html href:

$ freva plugin animator --doc
Animator (v2022.7.15): Create animations (in gif or mp4 format) This tool creates plots of solr facets and an animation.    

...
│ projection              │ Set the global map projection. Note: this should the name of the cartopy projection method (e.g PlatteCarree for Cylindrical Projection). Pleas │
│                         │ refer to <a href="https://scitools.org.uk/cartopy/docs/latest/crs/projections.html"target=_blank>cartopy website</a> for details. (default:     │
│                         │ PlateCarree)  
...

in juoyter however it does it without any problem overall I dfoun the error logs much more helpfhull to catch errors on I did not ahve the time revise the job handling in the python module

That won't be possible. The cli displays things as a text and text cannot be interpreted as html only the other way round.

I realise way too late today that this comment was bollocks, sorry. I removed a similar one but I forgot this one...