deegree / deegree3

Official deegree repository providing geospatial core libraries, data access and advanced OGC web service implementations
https://www.deegree.org
GNU Lesser General Public License v2.1
146 stars 99 forks source link

Document/clarify mechanism for embedded workspace selection #508

Open MrSnyder opened 9 years ago

MrSnyder commented 9 years ago

The handbook thoroughly describes how workspace selection works [1]. However, in custom projects, it is sometimes more convenient to bundle the workspace inside the WAR file.

There is another, yet undocumented option: If the WAR file/webapp directory contains a folder named "WEB-INF/workspace", this deegree instance will pick this directory as workspace folder. At the moment, this requires the DEEGREE_WORKSPACE_ROOT directory to exist and to be writable.

I believe, it may be more convenient to use the WEB-INF directory as DEEGREE_WORKSPACE_ROOT in such cases, but I am unsure if it's actually allowed to write to this directory according to the servlet specification.

To summarize: This feature should be documented and we may need to discuss the use of the DEEGREE_WORKSPACE_ROOT folder.

[1] http://download.deegree.org/documentation/3.3.13/html/basics.html#location-of-the-deegree-workspace-directory

vog commented 9 years ago

If the WAR file/webapp directory contains a folder named "WEB-INF/workspace" [...]. At the moment, this requires the DEEGREE_WORKSPACE_ROOT directory to exist and to be writable.

I may be confused here, but I believe that at the moment, DEEGREE_WORKSPACE_ROOT must not exist to make "WEB-INF/workspace" work.

If either DEEGREE_WORKSPACE_ROOT is set to an existing directory, or DEEGREE_WORKSPACE_ROOT is unset and ~/.deegree, that one is used instead of "WEB-INF/workspace".

vog commented 9 years ago

This is my proposal for a more consistent behaviour:

That way, we'd have a clear, logical order of precedence that is IMHO what one intuitively expects:

If we can't agree on "no fallback for DEEGREE_WORKSPACE_ROOT", my alternative proposal is to make it fall back first to WEB-INF/workspace, then to ~/.deegree. This is not what I'd prefer, but still a great improvement over the current situation.

MrSnyder commented 9 years ago

With regard to the first comment: Yes, you're right: DEEGREE_WORKSPACE_ROOT must not exist to make "WEB-INF/workspace" work.

Still, I would consider this consistent: If DEEGREE_WORKSPACE_ROOT is not set explicitly, it is set automatically (~/.deegree). In other words: DEEGREE_WORKSPACE_ROOT is always set, even if this is not done explicitly. There is no way not to set DEEGREE_WORKSPACE_ROOT.

Does this get my point across? Or am i missing something?

MrSnyder commented 9 years ago

I like your proposal, however for the case:

internally, we would still need to use DEEGREE_WORKSPACE_ROOT. As stated in the documentation, it is currently needed to store:

Maybe we can use the WEB-INF folder here. For this, we need write access to the directory. I am still not sure that writing files here is supported by the servlet specification and will work on all servlet containers.

@tfr42: Can you comment on this?

tfr42 commented 9 years ago

As far as I know the DEEGREE_WORKSPACE_ROOT directory can contain files which can be edit via the deegree administration console. Thus these resources need to be writable. To read from the WEB-INF directory is specified in the Servlet spec, which says about WEB-INF:

A special directory exists within the application hierarchy named WEB-INF. This directory contains all things related to the application that aren’t in the document root of the application. The WEB-INF node is not part of the public document tree of the application. No file contained in the WEB-INF directory may be served directly to a client by the container. However, the contents of the WEB-INF directory are visible to servlet code using the getResource and getResourceAsStream method > calls on the ServletContext, and may be exposed using the RequestDispatcher calls.

(see http://download.oracle.com/otn-pub/jcp/servlet-2.4-fr-spec-oth-JSpec/servlet-2_4-fr-spec.pdf on page 70)

But writing to that directory I would consider as not recommended since this works only for exploded WAR files. If the servlet container keeps the WAR file you will loose any change to the config file when you re-deploy the web application. To avoid that changes are lost after re-deployment the directory shall be outside of the web application WAR file since the temporary working directory does not ensure that changes made to files in that directory are maintained when the container or web app is restarted. The spec says in SRV.3.7.1 on Temporary Working Directories:

A temporary storage directory is required for each servlet context. Servlet containers must provide a private temporary directory for each servlet context, and make it available via the javax.servlet.context.tempdir context attribute. The objects associated with the attribute must be of type java.io.File. The requirement recognizes a common convenience provided in many servlet engine implementations. The container is not required to maintain the contents of the temporary directory when the servlet container restarts, but is required to ensure that the contents of the temporary directory of one servlet context is not visible to the servlet contexts of other Web applications running on the servlet container.

(see page 36)

So, when config files are placed inside the WEB-INF directory I would assume that than all config files are in read-only mode and the user can not use the deegree administration console to change those files. If the user wants to change the config files the DEEGREE_WORKSPACE_ROOT has to point to a directory outside of the web container.

vog commented 9 years ago

So, when config files are placed inside the WEB-INF directory I would assume that than all config files are in read-only mode and the user can not use the deegree administration console to change those files. If the user wants to change the config files the DEEGREE_WORKSPACE_ROOT has to point to a directory outside of the web container.

I fully agree with this, and I see no problem here. If you put stuff into WEB-INF, you do this to deploy an already configured deegree instance. If you want to a writable workspace, use a plain directory instead.

tfr42 commented 9 years ago

And what I missed to say is, that I do agree with both proposed strategies for resolving the deegree workspace as described in comment 3 and comment 5.

Furthermore the things said about the WEB-INF read-only mode do apply to Servlet Spec 3.1 (JSR-340) as well - of course.

MrSnyder commented 9 years ago

Thanks for the clarification. One remaining problem is that some files are actually written to the workspace. AFAIK, these are:

If writing to the WEB-INF folder is not allowed, then we are facing another problem here.

tfr42 commented 9 years ago

Well, it is not forbidden by the spec. The spec says that any change to the web application (in WEB-INFor temporary folder) might get lost after web context re-load or re-start of the servlet container. So I would consider this as a bad approach to persist configuration state within the web application. What about to set the deegree console in read-only mode for co-located (deegree workspace within the web app) deployments?

vog commented 9 years ago

In our case, these custom deegree builds don't contain the console at all, so this problem doesn't exist.

Apart from that, making the console read-only is a good idea, just to be sure. A possibly simpler alternative would be to disable the console at all when using a WEB-INF workspace.

stephanr commented 9 years ago

We also use a external folder with a declared DEEGREE_WORKSPACE_ROOT in most cases. But i also would preffer to only use WEB-INF for static/readonly (or maybee temp files).

MrSnyder commented 9 years ago

I support the option of having a (read-only) workspace in WEB-INF. Still (as mentioned), there are remaining problems we need to take care of, before we can actually consider this to be fully supported. At the moment, some deegree modules (not only the console!) rely on being able to actively write out files into the workspace:

I am not yet sure if this is feasible in every case, but I would like to avoid this. I consider cluttering the workspace with temporary files to have a bad smell.

In order to cope with this, every deegree instance could use some directory below /var/tmp/ (in the sense of the UNIX Filesystem Hierarchy [1]). Content in this folder is temporary, but preserved between reboots.

@tfr42: Is there some similar mechanism (a standard temporary folder that survives reboots/redeployments) for Java webapps?

[1] http://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard

vog commented 9 years ago

I second the idea of having a separate data directory. We should get rid of code that writes into configuration directories (except for the console, of course).

However, that data directory should not be hard coded to "/var/tmp" or similar, as such path assumptions may bite us back. Rather, I'd force the administrator to set that path explicitly when needed.

More generally, I propose to distinguish two cases:

  1. Writable workspace root (i.e. either ~/.deegree/, or explicitly set via DEEGREE_WORKSPACE_ROOT)
  2. Read-only workspace root (i.e. WEB-INF)

In both cases, there should be a separate directory into which data is written by deegree.

  1. Writable workspace root
    • The default data directory is WORKSPACE/var/
    • It may be overwritten by a configuration entry in the workspace
  2. Read-only workspace root
    • The data directory is defined by a configuration entry in the workspace
    • deegree refuses to start if that configuration entry is missing

I'd propose to name that configuration entry varDir and to put it into main.xml, but maybe there is a better name and place for it.

This proposal is based on the following assumption:

No two running deegree instances shall ever use the same workspace.

Is this something we can agree on?