GoogleCloudPlatform / app-gradle-plugin

The library has moved to https://github.com/GoogleCloudPlatform/appengine-plugins/tree/main/app-gradle-plugin
Apache License 2.0
153 stars 40 forks source link

Test server started with appengineRun by default interacts with live Cloud Datastore instead of a local one #445

Closed jdkoren closed 1 year ago

jdkoren commented 1 year ago

My team has a Java web app using Appengine and Cloud Endpoints, and it interacts with Cloud Datastore via the Objectify library. We recently migrated our build system from Maven to Gradle.

Problem

We discovered that by default a local test server started with the appengineRun task interacts with live production data in Cloud Datastore. This is markedly different from the appengine:run Maven goal, which always used a local datastore, and that was the behavior we were expecting. (Apparently we need to use a datastore emulator to get that behavior.)

As a result, my local test server with unfinished and experimental code modified our live production data and caused considerable disruption. This is startling and undesirable for a default behavior, and makes it easy for developers to shoot themselves in the foot.

Incidentally, the following message is printed in the local server log at startup and would mislead someone to believe that the test server uses a local datastore:

INFO: Local Datastore initialized:
  Type: High Replication
  Storage: {project path}/build/exploded-samplesindex/WEB-INF/appengine-generated/local_db.bin

I spent an inordinate amount of time trying to find that file or any file like it on my filesystem, but none existed, which further corroborated that the test server was communicating with our project's live Cloud Datastore.

Proposed solutions

(These are not mutually exclusive and may not be completely formed ideas.)

I. Remove misleading log message

Don't print a message saying that a local datastore was initialized if it wasn't.

II. Don't allow writes without using emulation

Allow the current behavior of communicating with live Cloud Datastore data, but only allow reads.

III. Require explicit allowance when not using emulation

Make the developer explicitly allow the test server to interact with live Cloud Datastore. Imagine a configurable property like this:

appengine {
  run {
    useDatastoreWithoutEmulation = true
  }
}

When appengineRun is invoked, check if there's a Datastore emulator. If there isn't, and the above property is not set or is set to false, exit with an error and print a message explaining how to resolve.

IV. Use managed emulation

What if the Gradle plugin could start and stop datastore emulation automatically? This way the developer can just invoke appengineRun, and can avoid accidents that result from forgetting to start the emulator themself. Imagine a configuration block like this, where all the properties could have reasonable defaults (and therefore be omitted):

appengine {
  run {
    datastoreEmulator {
      dataDir = '/path/to/data/dir'
      hostPort = 'localhost:8081'
      storeOnDisk = true
      consistency = 0.9
      useFirestoreInDatastoreMode = false
    }
  }
}

These would simply be passed as arguments to an invocation of gcloud beta emulators datastore start behind the scenes.

emmileaf commented 1 year ago

Thank you for reporting this - the local development server started by the appengineRun task should not be communicating with prod data, and it is odd that a local datastore file seems to be logged but not used or found. Sorry to hear the frustrations encountered there!

To help us troubleshoot and better understand the behavior discrepancy - could you help share more details of the setup under which this was encountered? In particular:

jdkoren commented 1 year ago

Are you using an app.yaml or appengine-web.xml based project, and what is the java runtime and app engine environment?

The app is using appengine-web.xml, java8 runtime, and (I think) the standard environment.

How is the run configuration set up for the project?

We currently do not have any run configuration, we only configure appengine.deploy.

Is the project doing any custom configuration of the local datastore location?

AFAIK we do not have any custom configuration of the local datastore location.

meltsufin commented 1 year ago

@jdkoren Have you tried following the directions here: https://cloud.google.com/datastore/docs/tools/datastore-emulator#automatically_setting_the_variables?

Can you clarify which version of the Maven plugin you were using, and which version of the Gradle one you're using? They should be equivalent in behavior.

cc/ @ludoch

ludoch commented 1 year ago

This is really weird... There is no need to extra cloud datastore emulator (used for non GAE local apps). The local GAE Dev AppServer boots at the same time in the same JVM the local devappserver as well as all the GAE API emulators. It seems you are also saying it works with Maven and does not work with Gradle? Can you share for both the app engine plugin settings in pom or build files?

Again there is no need for a Java8 GAE app to ever use gcloud beta emulators datastore, unless you are not using the com.google.appengine.api.datastore API classes of course.

jdkoren commented 1 year ago

@meltsufin @ludoch

Can you clarify which version of the Maven plugin you were using, and which version of the Gradle one you're using?

Previously we were using appengine-maven-plugin v2.4.1

<plugin>
  <groupId>com.google.cloud.tools</groupId>
  <artifactId>appengine-maven-plugin</artifactId>
  <version>2.4.1</version>
</plugin>

We are now using appengine-gradle-plugin v2.4.3

buildscript {
  dependencies {
    classpath 'com.google.cloud.tools:appengine-gradle-plugin:2.4.3'
  }
}

apply plugin: 'com.google.cloud.tools.appengine'

Have you tried following the directions here: https://cloud.google.com/datastore/docs/tools/datastore-emulator#automatically_setting_the_variables

Yes, I did follow the directions to set the environment variables. It might also be noteworthy that I needed to set an additional environment variable, otherwise I got an error when trying to start the test server: export DATASTORE_USE_PROJECT_ID_AS_APP_ID=true

chanseokoh commented 1 year ago

According to this doc

Note: We are migrating the local development environment to use the Cloud Datastore Emulator, For more information about this change, see the migration guide.

and this doc linked there (which I am not 100% sure if this applies to Java as well),

Cloud Datastore Emulator is progressively being rolled out as the default Datastore implementation for dev_appserver.

The Cloud Datastore Emulator is the default emulator for a portion of dev_appserver users. If you are using the Cloud Datastore Emulator, dev_appserver will display:

... Using Cloud Datastore Emulator.

(emphasis added by me)

So I think the first thing to check is whether the dev appserver is launching its own legacy local emulator or the Cloud Datastore emulator (which should run in a separate process, it seems), as well as if you are manually running your own emulator on top of it. Sounds like you are also launching your own emulator outside the dev appserver, so I guess the next would be to figure out if your app is connecting to the emulator run by the dev appserver or by you?

jdkoren commented 1 year ago

@chanseokoh My original comment was about using appengineRun alone; it was only after running into this issue that I found the page about the Datastore Emulator. Without starting an emulator (and without setting any of the associated environment variables), I see the following in the logs:

INFO: Local Datastore initialized:
  Type: High Replication
  Storage: {project path}/build/exploded-samplesindex/WEB-INF/appengine-generated/local_db.bin

That file does not exist, and one of the endpoint URLs immediately returns data that is in our live Cloud Datastore. I don't know how to check whether the dev appserver is launching an emulator or whether my app is connecting to it.

meltsufin commented 1 year ago

@jdkoren Is there any chance you can provide a minimal sample project that reproduces the issue, preferably with both a Maven and a Gradle build file? It's really surprising to see this kind of difference with the two plugins that were written to be equivalent.

emmileaf commented 1 year ago

@jdkoren Thanks for the pointers to the project code! Looking through it, I wonder if the use of Objectify and this Objectify issue is related here. The way the project initializes ObjectifyFactory() looks like what’s described in this comment, and perhaps this is bypassing what's being launched by the local dev app server.

That said, it’s still puzzling why this only surfaced after the move from maven to gradle - were there other significant changes (perhaps related to Objectify or environment variables) that were made along with this move?

emmileaf commented 1 year ago

One subtle difference I noticed in the gradle plugin (compared to maven) is that run.projectId defaults to deploy.projectId's value when not explicitly configured (source).

It doesn’t explain the main behavior of [wrong datastore being used], but may have played a part in allowing the communication - you can try explicitly setting this to something different in the run configuration. Changing this to align with maven could be a fix we want to make here, for safeguarding against scenarios like this.

jdkoren commented 1 year ago

@emmileaf Looking back through recent changes, we did update Objectify from version 5.0.3 to 6.0.9. I'm currently trying to follow the instructions here to see if I can make Objectify connect to a datastore emulator.

meltsufin commented 1 year ago

This looks promising. Thanks for the update!

meltsufin commented 1 year ago

From @jdkoren:

After some experimentation I can confirm that the ClassNotFoundException that I'm hitting is a problem that occurs when I connect Objectify to the local datastore emulator. It happens regardless of whether I use the maven plugin or the gradle plugin. If I don't run the datastore emulator, Objectify just connects to live Datastore (again regardless of which plugin).

Since the issue is not in the plugins, I'm closing the issue, but feel free to continue the discussion.

ludoch commented 1 year ago

According to this doc

This doc is for the Python dev_appserver (a py progrram), not for Java which does not use the external datastore emulator.

jdkoren commented 1 year ago

According to this doc

This doc is for the Python dev_appserver (a py progrram), not for Java which does not use the external datastore emulator.

@ludoch It seems you are right. I found the doc for java, which states the following:

The development web server simulates Datastore using a local file-backed Datastore on your computer. The Datastore is named local_db.bin, and it is created in your application's WAR directory, in the WEB-INF /appengine-generated/ directory. It is not uploaded with your application.

I see no mention of running a datastore emulator separately at all on this page.