fatty- / daisy-pipeline

Automatically exported from code.google.com/p/daisy-pipeline
0 stars 0 forks source link

Web UI running DAISY 202 to EPUB 3: temp files persist #303

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
After running one job on a 500 MB book, my daisy-pipeline/webui directory was 
1.6 GB. I found these three folders were quite large:

uploads/ contained inputBook.zip
local.temp/ contained a directory with the EPUB files in it
local.data/ contained an .epub file.

Is there any way that the Web UI can delete at least the uploads?

Who is in charge of local.temp/? Presumably the Web UI creates it but it gets 
used by the script. Maybe the script can delete its temp files after it 
finishes.

Having the job result persist in local.data is necessary but after the user 
deletes the job, I would expect the result to go away. Does the framework write 
directly to this directory or is the result copied by the Web UI? Incidentally, 
I noticed if I deleted the job via the CLI, nothing changed.

Original issue reported on code.google.com by marisa.d...@gmail.com on 25 Apr 2013 at 6:01

GoogleCodeExporter commented 9 years ago

Original comment by marisa.d...@gmail.com on 25 Apr 2013 at 6:04

GoogleCodeExporter commented 9 years ago
Uploads are deleted automatically if all these are true[1]:
 * 1 hour has passed since it was uploaded
 * A job has not been created for the upload
 * The upload is not being "used" by a browser (i.e. on a job creation page)
(The one-hour timeout I think I added for some cases where it was difficult to 
determine whether a browser was using the upload or not.)

Uploads are also deleted if a job is deleted. A job is deleted automatically if 
either of these are true:
 * The owner of the job is deleted
 * The job is deleted from the Pipeline 2 Engine (it checks every 5 minutes[2])
 * A job is older than the duration set under Job Maintenance in the settings, which I realize now is only available when running in server mode

Since the engine is running in local mode, result and temp-dirs must be 
provided in the jobRequest. The local.results and local.temp directories are 
passed as URIs in the jobRequest and its the engines and the scripts 
responsibility to populate them. A DELETE request is sent to Web API when a job 
is deleted from the Web UI.

I guess I've been thinking that it's the engines responsibility to delete the 
temporary files after a job has completed and to delete the result files after 
the job is deleted. But that's not the case in local mode, so I suppose it's up 
to the Web UI to clean up after a job has completed and after a job is deleted.

An engine running in local mode will be the most common configuration for most 
of the server installs as well as all the desktop installs. I think it would be 
pretty handy if it could handle temp-files and result-files in local mode as 
well. Maybe it could be enabled/disabled through a config option?

Anyway, to sum up for now:
 - the Web UI should delete temp files when a job is done
 - the Web UI should delete result files when a job is deleted
 - I guess the 1 hour timeout for uploads should be lowered. Maybe 10 minutes is enough.
 - a delete-button for jobs should be a priority for the next release
 - if a delete-button is implemented I think there's no need for "Job Mainenance" under settings in desktop mode so I'll keep that as is

What do you think?

[1] 
https://github.com/daisy-consortium/pipeline-webui/blob/master/app/Global.java#L
191
[2] 
https://github.com/daisy-consortium/pipeline-webui/blob/master/app/Global.java#L
252

Original comment by josteinaj@gmail.com on 25 Apr 2013 at 6:43

GoogleCodeExporter commented 9 years ago
Btw. the reason why uploads are not deleted immediately after submitting the 
job is because we may want to implement a "restart"-button at some point in 
which case we need the uploaded files. They are currently persisted for as long 
as the job exists.

Original comment by josteinaj@gmail.com on 25 Apr 2013 at 6:45

GoogleCodeExporter commented 9 years ago

Original comment by josteinaj@gmail.com on 25 Apr 2013 at 6:46

GoogleCodeExporter commented 9 years ago
Ok, thanks for the details! I obviously didn't wait long enough for files to 
get deleted. 

I agree with all the points you summarized. Also, is it possible to check more 
frequently than 5 minutes for deleted jobs, or does it compromise performance?

I suppose if a user deletes a job through the Web UI, then the temp and result 
files can be deleted immediately, without having to wait for the deleted-job 
check to run.

Original comment by marisa.d...@gmail.com on 25 Apr 2013 at 7:08

GoogleCodeExporter commented 9 years ago
Sure, we could check once every minute?
Additionally, maybe it could be checked done every time the job list is 
retrieved (i.e. when you click the job listing), I'll have to see if it would 
affect performance.

Yes, if the Web UI deletes it, then the files should be deleted immediately.

What do you think about adding support in the engine for handling the result 
and temp folders in local mode in the future? It would work just like it's 
currently done in remote mode.

Original comment by josteinaj@gmail.com on 25 Apr 2013 at 8:12

GoogleCodeExporter commented 9 years ago
I think it's pretty reasonable to expect the framework to clean up after a job; 
i.e., delete the temp files and result in local mode as well as in remote mode.

However, does the framework have enough information to know to clean up the 
temp folder? Or should that responsibility fall on the script?

Original comment by marisa.d...@gmail.com on 25 Apr 2013 at 8:21

GoogleCodeExporter commented 9 years ago
Yes, we created the px:output="temp" option so that temporary files could be 
separated from the result files. So now all the temporary files can be deleted 
safely by the framework right after the job has finished running.

Original comment by josteinaj@gmail.com on 25 Apr 2013 at 8:25

GoogleCodeExporter commented 9 years ago
The latest commit to the Web UI should fix this issue:
https://github.com/daisy-consortium/pipeline-webui/commit/0ce4256018f3cb8ec9f53a
99b5f6d16d40eb1a1c

Note that if you want file deletion to work you will need to update to the 
newest version of pipeline-clientlib-java as well:
https://github.com/daisy-consortium/pipeline-clientlib-java/commit/b6e0120d876aa
5a7ba8c306370bbb4779fd32439

Original comment by josteinaj@gmail.com on 26 Apr 2013 at 3:03

GoogleCodeExporter commented 9 years ago
The issue seems to have been fixed. I'm closing it. Feel free to open it again 
if it is not yet fixed.

Original comment by josteinaj@gmail.com on 20 Jun 2013 at 2:37