BSC-ES / autosubmit-gui

The Autosubmit Graphical User Interface (GUI) is the web-based Autosubmit frontend, allowing users to discover, monitor, and analyze experiments. It is based on ReactJS and relies on the Autosubmit API as the middleware to get experiment information.
MIT License
2 stars 0 forks source link

GUI for AutoSubmit v1 #30

Closed kinow closed 1 month ago

kinow commented 5 years ago

In GitLab by @dbeltran on Jan 3, 2019, 16:46

@mcastril

Hi there,

I am developing a GUI for autosubmit as you can see in autosubmit@3d4163eea9dc3aff5208b32ab541986d34ca9401.

It will be a separated executable which will be called autosubmit-GUI.

It will be a slow process because it is the first time that I develop any GUI application related, for now, I just committed a version (in an isolated branch) that looks awful but at least I made it work (except grouping/expand) with autosubmit create.

As related to the development strategy:

As for the libraries, for now, I am using PySide2(free for commercial use) <= 5.11 because I have trouble making it work with 5.12 due to having an old version of GCC (4.8).

Of course, all of this is open to discussion.

Edit 1: The code will be a bit mess up until I learn more PySide(QT) since I'm testing different approaches to archive what I want

offtopic: Git buttons for list, bold, etc does not work with chrome.

To do:

To do (Beta Priority):

Priority:

Architecture discussion on autosubmit#522 for future reference.

kinow commented 4 years ago

In GitLab by @wuruchi on Jan 15, 2020, 12:54

Hello team,

The tables "experiment_times" and "job_times", both of them located in "ecearth.db", have been successfully filled with the information from all autosubmit experiments in esarchive. The structure of "job_times" has changed a little from what had been described in https://earth.bsc.es/gitlab/es/autosubmit/issues/363#note_65670. The new structure is:

  1. detail_id: Unique Id of the register, autoincrement.
  2. exp_id: Id of exp, from the experiment table.
  3. exp_name: expid.
  4. job_name: The unique name of a job.
  5. created: Timestamp of the date this register was created.
  6. modified: Timestamp of the date this register was last modified.
  7. submit_time: Timestamp of the date when the job was submitted.
  8. start_time: Timestamp of the date when the job started running.
  9. finish_time: Timestamp of the date when the job was finished.
  10. status: Status of the experiment.

Thanks to proper data type management, the current size of ecearth.db is 113MB. It should not surpass 150MB during this year.

In "experiment_times" there will be only one register per experiment. In "job_times" where will be only one register per job.

As an example of the information that these tables bring, now it is possible to know the total number of jobs that Autosubmit has completed, and how much time they took with only one SQL Query. However, it is not totally reliable right now, consider it an experiment.

The next step is to develop and put into operation the task that will update these tables with the data from new experiments and the changes from those already registered.

kinow commented 4 years ago

In GitLab by @mcastril on Jan 15, 2020, 17:05

Thanks Wilmer,

Taking into account the enlargement of the row size in columns, I am thinking about some redundancies that we maybe can avoid.

Maybe now it is not so dramatic but when Autosubmit starts populating a similar table by itself we will need a row for job-run_of_this_job to have a history, and the row size will matter a lot.

kinow commented 4 years ago

In GitLab by @wuruchi on Jan 15, 2020, 17:46

Hello @mcastril

As you mention, field (3) is redundant. I will not include it in the next iteration.

Field(5) stores the date in which the register was first created, while field (6) saves the date when the last modification was performed. In this way, we sort of keep track of when the changes were done since there is only one register per job.

On other news, I have updated the App: http://bscesweb04.bsc.es/autosubmitapp/experiment/a2c8. The job information now shows the path to the out and err files, it can be easily copied to the clipboard by clicking a button that is next to the corresponding input box. In the experiments that I have tested, the path seems to be working. Anyway, please keep track of which jobs do not show an existing path if you happen to come across one of such cases.

kinow commented 4 years ago

In GitLab by @pechevar on Jan 23, 2020, 15:23

@gmontane and I would like to suggest that on the main page filer: http://bscesweb04.bsc.es/autosubmitapp/

The running button takes into account if the search filter was used. e.g. useful to find which of my experiments are running.

kinow commented 4 years ago

In GitLab by @yruprich on Jan 29, 2020, 17:36

Hi @wuruchi,

I noticed that the information about the path of the job logs (that appears on the right side once you click on a job) is wrong. At least for my experiment a1tu it does not provide the correct name. See for example:

/esarchive/autosubmit/a1tu/tmp/LOG_a1tu/a1tu_19500101_fc5_1_CLEAN.20200129103353.err

It should be:

/esarchive/autosubmit/a1tu/tmp/LOG_a1tu/a1tu_19500101_fc5_1_CLEAN.20200126134321.err

Thanks,

Yohan

kinow commented 4 years ago

In GitLab by @mcastril on Jan 30, 2020, 15:04

Hi @yruprich I saw this kind of error with @wuruchi just when the path feature was implemented. We thought that in case that a given job is again QUEUING or RUNNING, Autosubmit was generating a path that still was only in MN4 and not in local.

But it seems you didn't re-run that job right?

kinow commented 4 years ago

In GitLab by @yruprich on Jan 30, 2020, 15:09

Hi @mcastril,

in the present case I only ran this job once and it went through without problem.

Well, to be completely honest I ran this job twice but the first time it was 6 months ago...

kinow commented 4 years ago

In GitLab by @wuruchi on Feb 10, 2020, 10:38

Hi @yruprich

Thank you for your feedback.

This is an issue that seems to happen to some experiments that have had reruns. We will look into it and implement a fix as soon as possible.

kinow commented 4 years ago

In GitLab by @wuruchi on Apr 9, 2020, 11:55

marked the checklist item *Show the start date and the date (start date + crunch chunk size)** as completed

kinow commented 4 years ago

In GitLab by @wuruchi on Apr 9, 2020, 11:55

marked the checklist item Move the name of the job to the center of the circle in the Graph View. as completed

kinow commented 4 years ago

In GitLab by @wuruchi on Apr 9, 2020, 11:55

marked the checklist item Put the description of the experiment next to the name. as completed

kinow commented 4 years ago

In GitLab by @wuruchi on Apr 9, 2020, 11:55

marked the checklist item The navigation per status buttons should point to the first job with that status, and not to the last one as it is happening now. as completed

kinow commented 4 years ago

In GitLab by @wuruchi on Apr 9, 2020, 11:55

marked the checklist item In the description panel (right box): Show the list of currently waiting jobs. as completed

kinow commented 4 years ago

In GitLab by @wuruchi on Apr 9, 2020, 11:55

marked the checklist item In the description panel (right box): Show the list of currently waiting jobs. as incomplete

kinow commented 4 years ago

In GitLab by @wuruchi on Apr 9, 2020, 11:55

marked the checklist item In the description panel (right box): Show chunk number. as completed

kinow commented 4 years ago

In GitLab by @wuruchi on Apr 9, 2020, 11:55

marked the checklist item In the description panel (right box): Show member name. as completed

kinow commented 4 years ago

In GitLab by @wuruchi on Apr 9, 2020, 11:55

marked the checklist item Remove model and pkl from the top description. as completed

kinow commented 4 years ago

In GitLab by @wuruchi on Apr 9, 2020, 11:55

marked the checklist item The name of the Branch and HPC should be bigger. as completed

kinow commented 4 years ago

In GitLab by @wuruchi on Apr 9, 2020, 11:55

marked the checklist item The log should be scrollable. as completed

kinow commented 4 years ago

In GitLab by @wuruchi on Apr 9, 2020, 11:55

marked the checklist item When switching to the Tree View node, the Tree should be rendered automatically. If the Tree is already rendered and the experiment is running, the Tree View data should be refreshed automatically. as completed

kinow commented 4 years ago

In GitLab by @wuruchi on Apr 9, 2020, 11:56

marked the checklist item For big experiments, the Graph visualization should start zooming into a relevant section of the graph instead of showing the whole render. as completed

kinow commented 4 years ago

In GitLab by @wuruchi on Apr 15, 2020, 23:02

Hello @pechevar @mcastril

Just letting you know that something "strange" is happening with the /esarchive/autosubmit/as_times.db. I will continue looking for a fix tomorrow.

kinow commented 4 years ago

In GitLab by @mcastril on Apr 16, 2020, 14:59

Hi @wuruchi . Was it solved?

kinow commented 4 years ago

In GitLab by @wuruchi on Apr 16, 2020, 15:58

Not quite, but it is working right now.

I am looking for more optimization opportunities. The current implementation seems too susceptible to failure due to any disruption or increase of traffic in the network.

kinow commented 4 years ago

In GitLab by @mcastril on Apr 28, 2020, 13:05

marked the checklist item Show average running time for jobs (types) in the experiment. as completed

kinow commented 4 years ago

In GitLab by @mcastril on Apr 28, 2020, 13:05

marked the checklist item Show average queuing time for failed jobs in the experiment. as completed

kinow commented 4 years ago

In GitLab by @dbeltran on Apr 30, 2020, 16:01

mentioned in issue autosubmit#526

kinow commented 4 years ago

In GitLab by @wuruchi on Jun 3, 2020, 12:04

I have added some priority tasks.

kinow commented 4 years ago

In GitLab by @wuruchi on Jun 3, 2020, 20:04

marked the checklist item Show performance metrics as completed

kinow commented 4 years ago

In GitLab by @wuruchi on Jun 3, 2020, 20:05

Hello @mcastril

The first version of the visualization of metrics in GUI has been implemented. You can see it in a new tab called Performance.

Example:

https://earth.bsc.es/autosubmitapp/experiment/a2s5

kinow commented 3 years ago

In GitLab by @mcastril on Nov 3, 2020, 15:20

marked the checklist item Provide set status commands through the GUI as completed

kinow commented 3 years ago

In GitLab by @mcastril on Nov 3, 2020, 15:21

marked the checklist item Add specific URL for GUI funcionality as completed

kinow commented 3 years ago

In GitLab by @wuruchi on Nov 18, 2020, 11:16

Hello team.

As part of the API, there are 3 workers that run on the crontab of bscesweb04. These workers are in charge of updating the databases that provide important information that is shown on the GUI. Recent problems with esarchive have caused serious performance problems for these workers and subsequent overloading of bscesweb04. To solve this problem, a timeout mechanism has been added to the functions that these workers run. Basically:

These changes have been deployed and we hope we can overcome bscesweb04 overload problems.

FYI @kserrade @mcastril @dbeltran

kinow commented 3 years ago

In GitLab by @kserrade on Nov 18, 2020, 11:19

Great @wuruchi. I will be monitoring from time to time the server.

kinow commented 3 years ago

In GitLab by @mcastril on Nov 18, 2020, 22:23

Thanks Wilmer, looks robust enough. Let's see now how it performs as Kim says.

kinow commented 3 years ago

In GitLab by @mcastril on Jan 20, 2021, 09:50

Add contextual actions in the nodes that allow the user to see the log of the node, change status, re-run job or change platform.

Before going to Phase 2 I think we could give a try to this one. It would be a very powerful feature as it would save the time of opening a terminal, login in and looking for a log file whenever someone reports an issue in an experiment.

To start, it could be just a button in right panel, pointing to the log of the currently selected job, whose path is already shown in that panel. So it could be a button at the right of "Copy err" or "Copy out".

The Log could be opened in an overlay or in the already available Log tab, having a button to return to the Autosubmit's log.

kinow commented 3 years ago

In GitLab by @wuruchi on Jan 20, 2021, 12:30

Hello @mcastril

To start, it could be just a button in right panel, pointing to the log of the currently selected job, whose path is already shown in that panel. So it could be a button at the right of "Copy err" or "Copy out".

Yes, this is doable. Working on that.

kinow commented 3 years ago

In GitLab by @wuruchi on Jan 27, 2021, 13:34

Hello @mcastril

To start, it could be just a button in right panel, pointing to the log of the currently selected job, whose path is already shown in that panel. So it could be a button at the right of "Copy err" or "Copy out".

This feature has been implemented.

image

image

kinow commented 3 years ago

In GitLab by @etourign on Jan 27, 2021, 13:45

AWESOME! thanks a bunch

However, it seems the GUI does not look for the proper log files in some cases.

For example, for job a3e8_18500101_fc0_1_SIM for expid a3e8 (with vertical wrappers enabled for SIM)

the logs are in /esarchive/autosubmit/a3e8/tmp/LOG_a3e8/a3e8_18500101_fc0_1_SIM.20210112122236.err

But the AS GUI thinks they are in /esarchive/autosubmit/a3e8/tmp/LOG_a3e8/a3e8_18500101_fc0_1_SIM.20210127131949.err

Same problem with job a3e8_18500101_fc0_1_CMOROCE which was run without wrappers...

Same for expid a2dy and many others which I ran recently .

I cannot find a pattern... Except I ran these with AS0.12-fix not AS 0.13 ...

Should I open a separate ticket?

kinow commented 3 years ago

In GitLab by @wuruchi on Jan 27, 2021, 13:48

Hello @etourign

As far as I remember, there was a problem in .12 related to remote logs. Perhaps @mcastril remembers more about it.

kinow commented 3 years ago

In GitLab by @mcastril on Jan 27, 2021, 14:02

Wrapper logs are generated with another name (because the job log is actually the wrapper log, not the inner job log) and then renamed by the wrapper, if I am not wrong. Is it ok @dbeltran ?

So in case of a wrapper failure, the job can be left in the cluster without renaming. Are you handling this @dbeltran ? If not, it could be nice to identify those logs and make Autosubmit rename them.

kinow commented 3 years ago

In GitLab by @mcastril on Jan 27, 2021, 14:05

Sorry when looking at the logs I swapped both logfiles. It's just the other way around.

So according to AS logs the experiment was not run before day 25, but jobstep logs start on day 11. Was there any migration affecting this experiment or something?

kinow commented 3 years ago

In GitLab by @mcastril on Jan 27, 2021, 14:11

Nailing down the problem... The .out record is set to today at 13:19 for all jobs in the pkl. So I am afraid that something has happened with this experiment.

[bsc32252@dtransfer1 pkl]$ grep out *pkl
NNNS'a3e8_LOCAL_SETUP.20210127131949.out'
NNNS'a3e8_SYNCHRONIZE.20210127131949.out'
NNNS'a3e8_REMOTE_SETUP.20210127131949.out'
NS'a3e8_18500101_fc0_INI.20210127131949.out'
S'a3e8_18500101_fc0_1_SIM.20210127131949.out'
S'a3e8_18500101_fc0_2_SIM.20210127131949.out'
S'a3e8_18500101_fc0_3_SIM.20210127131949.out'
S'a3e8_18500101_fc0_4_SIM.20210127131949.out'
S'a3e8_18500101_fc0_5_SIM.20210127131949.out'
S'a3e8_18500101_fc0_6_SIM.20210127131950.out'
S'a3e8_18500101_fc0_7_SIM.20210127131950.out'
S'a3e8_18500101_fc0_8_SIM.20210127131950.out'
S'a3e8_18500101_fc0_9_SIM.20210127131950.out'
S'a3e8_18500101_fc0_10_SIM.20210127131950.out'
S'a3e8_18500101_fc0_11_SIM.20210127131950.out'
S'a3e8_18500101_fc0_12_SIM.20210127131950.out'
S'a3e8_18500101_fc0_13_SIM.20210127131950.out'
S'a3e8_18500101_fc0_14_SIM.20210127131950.out'
kinow commented 3 years ago

In GitLab by @dbeltran on Jan 27, 2021, 14:14

Wrapper logs are generated with another name (because the job log is actually the wrapper log, not the inner job log) and then renamed by the wrapper, if I am not wrong. Is it ok @dbeltran ?

Each inner job log is generated by the wrapper under the name

        out = str(self.template) + ".out"
        err = str(self.template) + ".err"

This was changed about three months ago due to repeated issues with the recovering of the log before this name was

    out = str(self.template) + .{id} + ".out"
    err = str(self.template) + .{id} + ".err"

However, autosubmit always rename these files in both cases, wrappers and simple jobs, in the recovering procurement adding the actual timestamp.

So in case of a wrapper failure, the job can be left in the cluster without renaming. Are you handling this @dbeltran ? If not, it could be nice to identify those logs and make Autosubmit rename them.

They should be always being renamed if the inner_job was active

kinow commented 3 years ago

In GitLab by @wuruchi on Jan 27, 2021, 14:19

Hello @mcastril @dbeltran

Autosubmit GUI gets the job err and our log names from the pkl file directly. Is it Ok to suppose the pkl file stores the right out and err file names for any Autosubmit version?

kinow commented 3 years ago

In GitLab by @dbeltran on Jan 27, 2021, 14:26

Hello @wuruchi ,

I did not look at it recently but I guess that is right since it should store the lastest info.

In addition,

They should be always being renamed if the inner_job was active

This is only true when the job has finished ( failed or completed) and autosubmit instance is active, otherwise it won't be renamed (neither in the cluster nor in local)

kinow commented 3 years ago

In GitLab by @wuruchi on Jan 27, 2021, 14:30

Ok, thank you @dbeltran.

This is only true when the job has finished ( failed or completed) and autosubmit instance is active, otherwise it won't be renamed (neither in the cluster nor in local)

I will consider this for the FAQ that will be included in the GUI.

kinow commented 3 years ago

In GitLab by @etourign on Jan 27, 2021, 14:32

probably some gpfs issue, I had to kill AS.

But I have the same issue (with wrapped as well as non-wrapped jobs) with several other recent experiments: a3e3, a3dz ,a3ff (ongoing), and some experiment which were ran a long time ago : a21o

And I have same issue with a369_18500101_fc0_35_SIM of a369 (which is your example).

kinow commented 3 years ago

In GitLab by @wuruchi on Jan 27, 2021, 14:49

Hello @etourign @mcastril

I will be working on a workaround extension for this feature that will look for the latest existing job log (.out or .err depending on the case), if the log listed in the pkl does not exists and the job has been finished.

I will post here when it is done.

kinow commented 3 years ago

In GitLab by @mcastril on Jan 27, 2021, 15:15

Is it Ok to suppose the pkl file stores the right out and err file names for any Autosubmit version?

Not in the case reported by @etourign , in my opinion:

https://earth.bsc.es/gitlab/es/autosubmit/-/issues/363#note_111857

Why all .our and .err logs would be wrong and having a non realistic date?