Erudika / para

Multitenant backend server for building web and mobile apps rapidly. The backend for busy developers. (self-hosted or hosted)
https://paraio.org
Apache License 2.0
528 stars 146 forks source link

Containerized Para deployment loses content upon Pod re-deployment #76

Closed cjchand closed 3 years ago

cjchand commented 3 years ago

Hi! I have what I suspect is a continuation of #47

Para version: 1.38.5 (though seen on 1.38.4, as well)

Issue: If the Para pod is re-deployed (e.g.: a Helm update or equivalent is run), Para loses knowledge of any content created prior to that point. The data is still retained in the DB, however it is not shown in the Para console nor Scoold.

Steps taken: After such a scenario, if I look at the various object types (e.g.: users) in the Para console, there are none listed (again, despite them being present in the DB).

I can, however, head to /{object_type/edit/{id} and see the object. If I save it, the object is then listed in the Para console.

I tried running para-cli rebuild-index, but that did not address the issue.

As mentioned in #47, it seems that there's likely some assumption in the code that what was indexed and/or cached prior will be there upon startup.

Logs, if they're helpful (you can see the index rebuild there, as well):

2021-04-19 22:33:37 [INFO ] --- Para.initialize() [embedded] ---
2021-04-19 22:33:37 [INFO ] Loaded new DAO, Search and Cache implementations - SqlDAO, LuceneSearch and CaffeineCache.
2021-04-19 22:33:39 [INFO ] HikariPool-1 - Starting...
2021-04-19 22:33:40 [INFO ] HikariPool-1 - Start completed.
2021-04-19 22:33:42 [WARN ] Server is unhealthy - the search index may be corrupted and may have to be rebuilt.
2021-04-19 22:33:44 [INFO ] Starting ParaServer using Java 11.0.10 on para-b57c4965d-crbz6 with PID 7 (/para/para.jar started by root in /para)
2021-04-19 22:33:44 [INFO ] The following profiles are active: embedded
2021-04-19 22:33:46 [INFO ] Instance #1 initalized and listening on http://localhost:8080
2021-04-19 22:33:51 [INFO ] Started ParaServer in 8.413 seconds (JVM running for 20.735)
2021-04-19 22:37:15 [INFO ] Deleting 'para' index before rebuilding it...
2021-04-19 22:37:17 [INFO ] rebuildIndex(): Done. 2 objects reindexed.
2021-04-19 22:38:12 [INFO ] Server is healthy.

Request:
I see one of these two possible paths:

  1. Provide a means to truly regenerate the entire search index from the ground up
  2. Provide details about where the on-disk content lives so I can create a persistent volume claim for it

For a K8s world, the first one - while suffering a performance hit at startup - is likely the most robust. It would also address the issues called out in the latter part of #47.

Thanks in advance!

cjchand commented 3 years ago

Taking a look at the container as it's running, I see Lucene indicies in /para/data. Presume if I keep those persistent that it will address the issue?

cjchand commented 3 years ago

I'll complete the trifecta and answer my own question :D ... though I still think there's room for improvement on the index rebuilding. Seems there should be an option to repopulate the entire index. I'm thinking it might just be repopulating things that either were recently added or haven't been marked as indexed.

Anyway, I was able to address it by adding a Persistent Volume Claim and mounting that as a volume. Here's the relevant bits (with the disclaimer that this is barebones, might not work for everyone, etc). For what it's worth, I borrowed most of the Scoold Helm chart to deploy Para.

pvc.yaml (net new file)

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: {{ include "para.name" . }}-pv-claim
  labels:
    app: {{ include "para.name" . }}
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi

Snippets of changes from deployment.yaml:

spec:
  template:
    spec:
      volumes:
        - name: lucene-data
          persistentVolumeClaim:
            claimName: {{ include "para.name" . }}-pv-claim
      containers:
        volumeMounts:
            - name: lucene-data
              mountPath: /para/data

Let me know if I'm missing anything here and if there's any further clarification needed on the original reindexing issue.

albogdano commented 3 years ago

I really can't help you much with Kubernetes. You definitely need to mount the /para/data directory where Para stores the search indices and the database files (in case you use the default H2 db). If you load any other Para plugins you'd have to mount /para/lib as well.

The rebuild index functionality works on the entire table for a given app - it will read every single row in that table and will then index the whole row of data. Thus, at the end of the operation, the whole table will be reindexed. It does exactly what you describe - the old index is deleted, and a new index is repopulated with all the data from db.

I don't see any issues here.