davidfrantz / force

Framework for Operational Radiometric Correction for Environmental monitoring
GNU General Public License v3.0
172 stars 50 forks source link

Level2 - Couldn't open the directory: Input/output error #270

Closed kesslerf closed 1 year ago

kesslerf commented 1 year ago

Describe the bug I am running the docker image (davidfrantz/force:latest, pulled 4 weeks ago, so should be 3.7.10(?)) via python. I am running time-series analysis and level2-processing. Sometimes I get the bug, that at some point while processing the queue-file there is an Input/Output error. Images are not processed as fast afterwards/it stops at some point. I checked the .SAFE files are on the system at the path given in queuefile and as it is running for the first images there shouldnt be permission-related errors.

62 images enqueued. Start processing with 4 CPUs
Computers / CPU cores / Max jobs to run
1:local / 8 / 4

Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
ETA: 4980s Left: 16 AVG: 311.26s  local:4/46/100%/338.2s Couldn't open the directory: Input/output error
ETA: 7554s Left: 9 AVG: 839.34s 

This is the python-line, params is the path to the attached file and id is a random id.

client.containers.run("davidfrantz/force:latest",f"force-level2 {params}",volumes={'/codede/':{'bind': '/codede','mode':'ro'},'/force/':{'bind': '/force','mode':'ro'},'/data/':{'bind': '/data','mode':'rw'}}, name=f'level2_{id}')

Expected behavior Process all images without Input/output error

Parameterization Param-file is attached, i had to save it as txt in order to upload it here.

Setup Always add information on your system, e.g.

davidfrantz commented 1 year ago

Hi @kesslerf,

I have no idea to be honest. This is very unusual and such things are typically rather related to the system or hardware. It might be that discs that are too slow, that the network is shortly interrupted, or probably a slow file system is used.. Really hard to say....

I also have no experience with running FORCE through Docker from Python, but I don't think that this is causing these issues.

The only thing in the parameterfile that was striking, is that you overtask the VM. 4 Processes with 4 threads equals 16 CPUs (8 are available). However, I don't believe that this is causing write/read errors.

Sorry for not giving you a more specific answer. David

kesslerf commented 1 year ago

Thanks for the reply, I actually expected this to be kind of unfamiliar. As it is not my own filesystem it might be due to errors on the servers etc..

Thanks for the hint about the parameters, i will change the param-files to address this. Which of the two variables should I set to 2 instead of 4?

Thanks, Felix

davidfrantz commented 1 year ago

Using more processes than threads will be more efficient:

NPROC = 4 NTHREAD = 2

You could even try

NPROC = 8 NTHREAD = 1

With 64GB RAM, this might work.

Is this a cloud? Do you use block or object storage? Object storage might not work very well with the writing patterns of FORCE.

Cheers, David

kesslerf commented 1 year ago

Thank you. I am using force in a docker-swarm and some machines have different setup due to limitations in total amount of RAM etc.

I will probably go for 8 and 1 and see if there is a huge difference in processing time.

We are using a cloud yes, but I am in no way an expert in anything of these so I can not answer the question, sorry.