GeoNode / geonode

GeoNode is an open source platform that facilitates the creation, sharing, and collaborative use of geospatial data.
https://geonode.org/
Other
1.43k stars 1.12k forks source link

GeoNode Docker/GeoServer Cache is jamming up the disk space #10197

Open Holz1GST opened 1 year ago

Holz1GST commented 1 year ago

Dear GeoNode community/developers,

We recently setup a GeoNode Docker instance on our Ubuntu 20.04.5 LTS Server. We started uploading data to it and had to notice that our 1.7 TB disk quickly ran out of space.

The Problem

While uploading data to our GeoNode Docker instance the 1.7 TB HD quickly ran out of space. A first look at the Geoserver Backend showed that something seemed to be wrong with the Geoserver-Cache as some uploaded Layers had a Cache of several TB. Clearing the Cache, making some changes to the disk quota settings and restarting the docker containers seems to have fixed this issue, as the Cached Layers are cleared correctly now.

Never the less the 1.7 TB disk space are still jammed up. A closer look at the usage of disk space showed that there are three major culprits in the docker volumes taking up the disk space. Culprit 1: image Culprit 2: image Culprit 3: image

As we have uploaded about 600 GB of data GeoNode shouldn't use up 1.6 TB of space. Apparently Culprit 1 & 2 are containing "tmp" folders for each uploaded layer. Should these folders be cleared automatically by GeoNode after finishing the upload? Culprit 1: image Culprit 2: image

As far as I understand geoserver stores the data in the geonode-gsdatadir. Thus I suppose that the tmp-folders there actually contain the uploaded data and shouldn't be removed. What is being stored in the geonode-statics volume then? Should this data be removed by GeoNode automatically?

Steps to Reproduce the Problem

  1. Install and Setup GeoNode using the Docker installation
  2. Upload data to GeoNode
  3. Check the size consumed by the docker volumes

Specifications

afabiani commented 1 year ago

@Holz1GST please make sure the GeoServer DiskQuotas have been correctly enabled and configured

https://docs.geoserver.org/stable/en/user/geowebcache/webadmin/diskquotas.html

Holz1GST commented 1 year ago

Thank you for your answer!

As I mentioned above we already fixed the caching problem by changing the default disk quota settings from image to image Now the layers cache is correctly deleted.

None the less the disk space is consumed by Culprit 1 and 2. As I understand it Culprit 1 is the default data folder of geoserver, thus we should not touch it. Can we safely remove the data from Culprit 2 then? Why does Geonode store data in two different directories?

afabiani commented 1 year ago

Hi @Holz1GST in theory the temporary files should be removed by GeoNode automatically or, at least, there shouldn't be duplicates. Let me double check the issue.

Do you confirm that you are using an updated version of GeoNode 4.0.x running with docker? Any additional/custom setup?

Holz1GST commented 1 year ago

Yes, we are using version 4.0.x of GeoNode with docker. Aside from changing the default passwords we made no changes to the default settings.