OpenGrok / docker

WARNING: this repository is archived !
58 stars 31 forks source link

Opengrok docker image repository synchronization #27

Closed jetm closed 5 years ago

jetm commented 5 years ago

Hi,

Reading Repository synchronization at Opengrok wiki, I wonder if in the Opengrok docker image is supported opengrok-sync to sync and index a Git code repository or run opengrok-reindex-project after a Git pull.

I did some tests and after I updated a Git repository and ran the script indexer (index.sh), the newly added files are not able to be found by searching in path field. I had to remove all the indexed data and run again the indexer (this way is very slow because the codebase weight 10 GB and the indexer takes 3h to finish).

Should I run opengrok-reindex-project instead of index.sh?

vladak commented 5 years ago

Firstly, it's weird that the indexer did not detect the newly added files. Without more detail it is hard to tell what happened.

As for the repository synchronization, it depends on the diversity of the repositories. For the basic use case it is sufficient to run opengrok-mirror and plain reindex.

The opengrok-reindex-project script is meant to be run from opengrok-sync.

jetm commented 5 years ago

Sorry for lack of more details.

In the setup I am testing, I have mounted as a volume the indexed data: --volume /opengrok-data:/data because I don't want to lose that data if the PC is rebooted and it's difficult to maintain a container with GBs of indexed data. I have disabled the internal indexer cycle (REINDEX=0) and it's triggered out site with a cron job; as the documentation says docker exec <CONTAINER_ID> /scripts/index.sh. In the generated logs I don't see any errors and the indexer always finished successfully. In the xref I can see the newly added files, but I am unable to found them looking in the path field.

Let me know if I can provide more information.

jetm commented 5 years ago

After some testing, I found removing the symlinks inside the container it fixed this issue. The change looks like this:

diff --git a/Dockerfile b/Dockerfile
index 903da1c..1ff37a7 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -22,12 +22,9 @@ MAINTAINER OpenGrok developers "opengrok-dev@yahoogroups.com"

 #PREPARING OPENGROK BINARIES AND FOLDERS
 COPY --from=fetcher opengrok.tar.gz /opengrok.tar.gz
-RUN mkdir /opengrok && tar -zxvf /opengrok.tar.gz -C /opengrok --strip-components 1 && rm -f /opengrok.tar.gz && \
-    mkdir /src && \
-    mkdir /data && \
-    mkdir -p /var/opengrok/etc/ && \
-    ln -s /data /var/opengrok && \
-    ln -s /src /var/opengrok/src
+RUN mkdir -p /opengrok /var/opengrok/etc /data /src && \
+    tar -zxvf /opengrok.tar.gz -C /opengrok --strip-components 1 && \
+    rm -f /opengrok.tar.gz

It makes data and src as directories, instead of symlinks. Are you interesed in this change? I can make a PR.

In essential this is how I mount the container for the code:

docker run \
    --detach \
    --env REINDEX=0 \
    --volume /project:/src \
    --volume /indexed_opengrok_data:/data \
    --publish 8080:8080
vladak commented 5 years ago

It is surprising that the symlinks were the cause.

Personally, I'd like everything under /opengrok in the container, including the configuration. Feel free to go ahead with the PR, it will be step in the right direction.

vladak commented 5 years ago

Given that the repository synchronization scripts are part of the OpenGrok distribution, it should not be hard to run opengrok-mirror (or convert the indexer() shell function to use opengrok-sync which will run the opengrok-mirror) as part of the process.

vladak commented 5 years ago

fixed in d9af82a77dc44ea446223f99947d154941a5c0f3