NCEAS / metadig-engine

MetaDig Engine: multi-dialect metadata assessment engine
7 stars 5 forks source link

Java Scripting Engine and Python 3 #211

Closed gothub closed 4 months ago

gothub commented 5 years ago

The quality engine uses the Java ScriptEngine to execute Python check scripts. The script engine instance used to run these scripts is based on a Jython interpreter.

This is working fine, however, Jython is perpetually stuck at supporting python 2.7, but the Python Foundation will be discontinuing support for python 2 on Jan 1, 2020.

Currently the metadig-engine python checks only use packages that are implemented in Jython, so there is no external dependency on external python packages, with the exception of metadig-py, which is currently very basic.

Checks that are being implemented for FAIR (especially Accessible and Interoperable ones) will need to use the DataONE python library. This external dependency may cause complications as this library has many dependencies, and the highest version usable will be 2.6.2, as I believe later versions require python 3.

So the options are (not mutually exclusive):

amoeba commented 5 years ago

Tough problem! What would you think about refactoring the Python implementation to work more like the R implementation which just relies on calling a Python process directly? It's not a great solution but it's seemed to work well so far and doesn't require anything more from Java than being able to start a process on the host and that host having python3 and any packages.

R execution is currently slower

Is this because of the way the running of R checks was implemented? I guess I hadn't noticed this.

SciJava looks pretty good at first glance, though possibly not likely to be around for a "long time".

gothub commented 5 years ago

@amoeba refactoring the Python implementation is a good idea and might eventually have to be done. The slowness with the R implementation is due to each check that uses R performs a call to re- execute Rscript. There is an issue describing this (i think, will check) so it would involve running R in the background and sending multiple checks to it. Yes, i'm hesitant to add a dependency on SciJava as well, but don't see other viable JSR 223 scriptengine implementations for python/cpython/jython..

jeanetteclark commented 1 year ago

The path I'm going to take is to replace jython with jep. This blog post is a little outdated but I haven't found anything else that fits as nicely as Jep does in that has a CPython interpreter, and runs within a standard Java install.

here is an example I worked up a while back:

  public static void jepMethod() {
      //help java find the jep library...
      MainInterpreter.setJepLibraryPath("/usr/local/lib/python3.10/site-packages/jep/libjep.jnilib");
      // set up the test python code
      String code = 
          "import pandas as pd\n"+
          "import numpy as np\n"+
          "def call():\n"+
          "    global x\n" +
          "    f2 = pd.DataFrame({'A': np.random.randint(x, size = (50))})\n"+
          "    result = np.mean(f2.A)\n"+
          "    return(result)\n";
      try (Interpreter interp = new SharedInterpreter()) {
          // execute the function definition, run it, and extract the result
          interp.set("x", 10);
          interp.exec(code);
          Object output = interp.getValue("call()");
          System.out.println(output);
      } catch (JepException e) {
          System.out.println(e);
      }
  }
jeanetteclark commented 1 year ago

okay so Jep isn't quite as much as a drop in replacement as I had hoped - it deprecated it's JepScriptEngine and JepScriptEngineFactory 5 or 6 releases ago which means that I can't really use it as a drop in replacement for Jython unless I implement these methods myself, which...idk maybe isn't the best idea? It is fairly complicated (there are like 15-20 methods to override in each class), so I can't decide what would be the best path - refactor the dispatch code to just work with the JepInterpreter class, write the classes described above as a drop in replacement, ...or choose some other Java/Python bridge.

mbjones commented 1 year ago

One thought I had was to look at how Jupyter and Quarto handle the handoff to Python and running standalone notebooks. Because Jupyter supports multiple execution kernels, this may be a good way to handle R sessions as well. From what I can tell, you can run the Kernel Gateway (https://github.com/jupyter-server/kernel_gateway) to create an endpoint that can be used to start new kernels, and send it code to execute in a notebook cell. I haven't looked into this, but I expect they will have nicely-structured APIs to send data in and get results back.

jeanetteclark commented 1 year ago

Hm interesting. I am banging my head against a wall trying to get Jep to work correctly - the Java VM can't find the library for some reason - so the kernel gateway is looking promising right now.

I found this demo for java: https://github.com/Hacky-DH/kernel_gateway_java_demos

jeanetteclark commented 11 months ago

Update: Jep seems now to be working reasonably well locally. Next up is working on the containers and then deploying to the dev cluster to see what happens

jusana commented 11 months ago

Hello, I played a little with the jep implementation engine through the API (metadig-webapp).

I built a local jar engine and a local metadig-py (py3) wheel , then used a slightly customised Dockerfile and docker-compose.yml" files (i'm not using k8s). I also had trouble having the engine find the jep path so i had to add the following in the compose.yml for the webapp service

    environment:
      - JEP_LIBRARY_PATH=/usr/local/lib/python3.10/dist-packages/jep
      - CATALINA_OPTS=-Djava.library.path=/usr/local/lib/python3.10/dist-packages/jep

And installed the python-is-python3 deb package

I also have sometimes "jep invalid thread access" for 2 exact same consecutives call from the API ... bur not always. image

All in all, it seems a very good and promising new engine version ... cant wait for the stable version ...

Thanks.

ps: also found a lot of these errors "value": "ERROR: jep.JepException: <class 'NameError'>: name 'unicode' is not defined"

jusana commented 11 months ago

for my "ps" notice i found that the checks were already corrected on the develop branch of metadig-checks ....

but i have now: ERROR: jep.JepException: <class 'TypeError'>: decoding str is not supported

jeanetteclark commented 11 months ago

Thanks for checking it out! Are you also building the develop version of metadig-py? That should solve your most recent error. What branch of metadig-engine are you building? I think I got the invalid thread issue resolved on the feature-jep branch but would love to hear if you’re still hitting it.

jusana commented 11 months ago

Hello, Yes i used the develop branch of metadig-py (rebuilt from poetry project) and the feature-jep branch of the engine. But i noticed tha i can launch multiple suites assessment without any errors, but when i try a single check then a suite I have the "invalid thread" error ... When i run sigle checks subsequenly i also have this error (but it's ok the first time)

i think the last errors are due to .decode('utf-8') that are still in the checks files python code.

jusana commented 11 months ago

that would be those ones

[ { "Value": "Entity Attribute Names Differ from Definitions", "Location": "$['result'][28]['check']['name']" }, { "Value": "Entity Attribute Names Are Unique", "Location": "$['result'][29]['check']['name']" }, { "Value": "Entity Attribute Definition Present", "Location": "$['result'][30]['check']['name']" }, { "Value": "Entity Attribute Definition Sufficient", "Location": "$['result'][31]['check']['name']" }, { "Value": "Entity Attribute Storage Type Present", "Location": "$['result'][32]['check']['name']" }, { "Value": "Entity Attribute Domain Present", "Location": "$['result'][41]['check']['name']" }, { "Value": "Entity Attribute Measurement Scales Present", "Location": "$['result'][43]['check']['name']" } ]

jusana commented 11 months ago

[ { "Value": "entity.attributeName.differs.1", "Location": "$['result'][28]['check']['id']" }, { "Value": "entity.attributeNames.unique.1", "Location": "$['result'][29]['check']['id']" }, { "Value": "entity.attributeDefinition.present.1", "Location": "$['result'][30]['check']['id']" }, { "Value": "entity.attributeDefinition.sufficient.1", "Location": "$['result'][31]['check']['id']" }, { "Value": "entity.attributeStorageType.present.1", "Location": "$['result'][32]['check']['id']" }, { "Value": "entity.attributeDomain.present.1", "Location": "$['result'][41]['check']['id']" }, { "Value": "entity.attributeMeasurementScale.present.1", "Location": "$['result'][43]['check']['id']" } ]

jeanetteclark commented 11 months ago

Hm I do not see that line of code, and all of those checks are working for me. Could you point out what line the .decode('utf-8') call appears in one of them?

jusana commented 11 months ago

hmm, indeed that's strange. Actually i just jsonpath parsed the output of the Fairsuite run : thunder-file_cb2a1035.json

But anyways, i found decode() in "entity.format.nonproprietary.1"

Maybe those other checks call this one, but i dont know why the engine gives me error for the checks in the previous message ??? Or maybe the cause is totally different than .decode('utf-8')

jeanetteclark commented 11 months ago

okay I do see it there! and thanks for sending the full run - I'll have a look and report back.

jeanetteclark commented 11 months ago

Hi Julien - I think that I tracked down the issue with the invalid thread, and I pushed a minor change to the webapp if you'd like to try. I also took out that .decode bit of code from the python check, though I'm not sure that is actually causing your issue. If you'd like to give it another whirl from your setup, I'd be interested to see the results. Thanks again for doing your exploration!

jusana commented 11 months ago

Hello, I rebuilt the webapp war with your new ChecksResource.java and ... it seems to be indeed better. I can now play single checks multiple times without error. I did not try the new checks without "decode"

jusana commented 11 months ago

Hello @jeanetteclark, I recently used the engine more intensively (high rate requests ) and i still have the "invalid thread" error on many requests. I dont have timeouts.

2023-11-02T20:23:49.368084413Z 20231102-20:23:49: [ERROR]: jep.JepException: Invalid thread access. [edu.ucsb.nceas.mdq.rest.SuitesResource] 2023-11-02T20:23:49.368115503Z java.lang.RuntimeException: jep.JepException: Invalid thread access. 2023-11-02T20:23:49.368118502Z at edu.ucsb.nceas.mdqengine.dispatch.JepScriptEngine.put(JepScriptEngine.java:119) 2023-11-02T20:23:49.368121115Z at edu.ucsb.nceas.mdqengine.dispatch.Dispatcher.dispatch(Dispatcher.java:65) 2023-11-02T20:23:49.368123139Z at edu.ucsb.nceas.mdqengine.processor.XMLDialect.runCheck(XMLDialect.java:254) 2023-11-02T20:23:49.368125158Z at edu.ucsb.nceas.mdqengine.MDQEngine.runSuite(MDQEngine.java:127) 2023-11-02T20:23:49.368142711Z at edu.ucsb.nceas.mdq.rest.SuitesResource.run(SuitesResource.java:230) 2023-11-02T20:23:49.368149757Z at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 2023-11-02T20:23:49.368152185Z at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) 2023-11-02T20:23:49.368154456Z at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 2023-11-02T20:23:49.368156509Z at java.base/java.lang.reflect.Method.invoke(Method.java:568) 2023-11-02T20:23:49.368158668Z at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52) 2023-11-02T20:23:49.368160781Z at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:124) 2023-11-02T20:23:49.368162847Z at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:167) 2023-11-02T20:23:49.368164859Z at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:176) 2023-11-02T20:23:49.368166895Z at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:79) 2023-11-02T20:23:49.368168963Z at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:475) 2023-11-02T20:23:49.368171174Z at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:397) 2023-11-02T20:23:49.368173211Z at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:81) 2023-11-02T20:23:49.368175340Z at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:255) 2023-11-02T20:23:49.368179260Z at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248) 2023-11-02T20:23:49.368181335Z at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244) 2023-11-02T20:23:49.368183225Z at org.glassfish.jersey.internal.Errors.process(Errors.java:292) 2023-11-02T20:23:49.368185254Z at org.glassfish.jersey.internal.Errors.process(Errors.java:274) 2023-11-02T20:23:49.368187190Z at org.glassfish.jersey.internal.Errors.process(Errors.java:244) 2023-11-02T20:23:49.368189249Z at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265) 2023-11-02T20:23:49.368191333Z at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:234) 2023-11-02T20:23:49.368193344Z at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:684) 2023-11-02T20:23:49.368195398Z at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:394) 2023-11-02T20:23:49.368197300Z at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346) 2023-11-02T20:23:49.368202909Z at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:366) 2023-11-02T20:23:49.368204876Z at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:319) 2023-11-02T20:23:49.368213380Z at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205) 2023-11-02T20:23:49.368216451Z at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:212) 2023-11-02T20:23:49.368219544Z at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:156) 2023-11-02T20:23:49.368223614Z at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:51) 2023-11-02T20:23:49.368226409Z at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:181) 2023-11-02T20:23:49.368229241Z at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:156) 2023-11-02T20:23:49.368232247Z at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:168) 2023-11-02T20:23:49.368234914Z at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:90) 2023-11-02T20:23:49.368237793Z at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:483) 2023-11-02T20:23:49.368240542Z at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:130) 2023-11-02T20:23:49.368243273Z at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:93) 2023-11-02T20:23:49.368246308Z at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:679) 2023-11-02T20:23:49.368249250Z at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:74) 2023-11-02T20:23:49.368252322Z at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:342) 2023-11-02T20:23:49.368256339Z at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:617) 2023-11-02T20:23:49.368258875Z at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:63) 2023-11-02T20:23:49.368261604Z at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:934) 2023-11-02T20:23:49.368264360Z at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1698) 2023-11-02T20:23:49.368267440Z at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:52) 2023-11-02T20:23:49.368279335Z at org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1191) 2023-11-02T20:23:49.368282608Z at org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:659) 2023-11-02T20:23:49.368285746Z at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) 2023-11-02T20:23:49.368288644Z at java.base/java.lang.Thread.run(Thread.java:833) 2023-11-02T20:23:49.368291295Z Caused by: jep.JepException: Invalid thread access. 2023-11-02T20:23:49.368298184Z at jep.Jep.isValidThread(Jep.java:230) 2023-11-02T20:23:49.368301447Z at jep.Jep.set(Jep.java:367) 2023-11-02T20:23:49.368304470Z at edu.ucsb.nceas.mdqengine.dispatch.JepScriptEngine.put(JepScriptEngine.java:117) 2023-11-02T20:23:49.368307606Z ... 52 more

I precise that i dont use the worker but the webapp (API) directly in which i installed the engine with the latest fixes.

I have many trries for the same metadata to be assessed and i sometimes succeed sometimes fails after all the retries

jeanetteclark commented 11 months ago

Hm interesting, thanks for letting me know. Are you submitting requests faster than the checks are completed/returning results? I have not worked much with the API side at all (I've only relatively recently taken over metadig development) but I know that parts of the API do not work correctly and need some attention.

jusana commented 11 months ago

thanks for your reply. I dont think i am submitting faster than returning (but maybe) , but the tomcat should be able to multithread and stand-by the other. And it seems that the "invalid thread" is coming from the "suitesResource" module: 2023-11-02T20:23:58.949231104Z 20231102-20:23:58: [ERROR]: jep.JepException: Invalid thread access. [edu.ucsb.nceas.mdq.rest.SuitesResource] 2023-11-02T20:23:58.949274026Z java.lang.RuntimeException: jep.JepException: Invalid thread access. 2023-11-02T20:23:58.949279575Z at edu.ucsb.nceas.mdqengine.dispatch.JepScriptEngine.put(JepScriptEngine.java:119)

Maybe you could try the same fix for checksResource you did last time but for SuitesResource ?

I am working with the engine and the webapp as of the last time we spoke.

Thanks again.

jeanetteclark commented 11 months ago

The thread should be close from the runSuite method so I'm a little surprised that the threading issue is still popping up, but I guess it doesn't hurt to close it from the run method in SuitesResource as well. I just pushed that change let me know how it goes for you

jusana commented 11 months ago

i slowed my requests down to 12/minutes ... and had a much better success/tries rate ... but still those errors. I will try your new push and let you know.

Thank you !

jusana commented 11 months ago

Hello, Actually, it didnt do much ... still "invalid thread" errors And btw, i edited this line https://github.com/NCEAS/metadig-webapp/blob/8c9b38c08f05db9b602f0f9eda607ae7e29b0d66/src/main/java/edu/ucsb/nceas/mdq/rest/SuitesResource.java#L289 to return Response.ok(resultString).build(); to get the result in priority high directly in the webapp VM. Maybe this could be the cause ?? Thanks

jeanetteclark commented 8 months ago

Hi @jusana - I was recently working on a different aspect of the web API and tried to recreate your jep thread errors, but was unable to. Do you have reproducible set up that you can send? I also made some changes to the code so that you shouldn't have to use the env vars to setup the Jep interpreter as long as you have a metadig.properties file with jep.path set appropriately

jusana commented 8 months ago

Hi, @jeanetteclark - Sorry I was totally on another project lately ... But I will dive in the metadig-engine again in a few days, so I will keep you informed with my new investigations. Thanks.

jeanetteclark commented 8 months ago

no worries at all! just wanted to make sure you knew I had not forgotten :)