devo-ps / pipelines

Build pipelines for automation, deployment, testing...
MIT License
119 stars 13 forks source link

Proposal: pipeline add global flag to limit history log retaintion #133

Closed parkzhou0527 closed 1 year ago

parkzhou0527 commented 2 years ago

Background

We have a pipeline installation running for almost 3 years (in one of our customers); so build history accumulated for quite a large number of 1000+;

And our developers find that it's very very slow to open the pipeline homepage, and the chrome memory was consumed too much.

After a quick debug, we find it's was caused by the pipeline history, the homepage tries to load all the history at once.

So, we cleanup the history to less than 200; everything runs so quickly again.

Proposal

Pipeline should support a global flag --log-limits (which might have it's own default value, say 200); this global flag was respected by the pipeline.

We do not want to too much complicated policy to control the history log reservation. For our daily development cycle, the latest 5 build log (I means most of the times) is enough to let us debug errors.

parkzhou0527 commented 2 years ago

@kaleocheng

kaleocheng commented 2 years ago

actually there are two issues:

  1. the backend loads all history for each get pipelines api https://github.com/Wiredcraft/pipelines/blob/aeb8b65d47bd77ef7b5e9e4f21b8b1933fc39952/pipelines/api/utils.py#L82 which can be a mem killer and slow down api response
  2. there is no pagination support on the get pipelines api means frontend also render all history meta data, which can slow down the web ui loading.

for the first one in a long-term we'd better to introduce a db storage for pipelines( e.g. redis/postgres), but it's another topic, before that we can apply a quick fix on the current file based storage:

  1. update https://github.com/Wiredcraft/pipelines/blob/aeb8b65d47bd77ef7b5e9e4f21b8b1933fc39952/pipelines/api/utils.py#L81 to only return latest X items sorted by created or modified time
  2. add new flag for the X for pipelines cli. let's take Park's suggestion name --log-limits

for the second one I will create another ticket to determine.

sp3c73r2038 commented 2 years ago

for a quick mitigation, see #134.

current run folder names are just uuid, lacking metadata, causing hard to sort without further looking into the status JSON files.

a further mitigation could be writing an additional metadata status file for each pipeline, including history runs, with their creation date etc. this would only happen when create new runs. may still easier than totally switching to DBMS.