ReactionMechanismGenerator / RMG-Java

The Java version of RMG: Reaction Mechanism Generator
http://rmg.sourceforge.net/
MIT License
29 stars 36 forks source link

avoidfork not working on Windows #274

Closed jwallen closed 11 years ago

jwallen commented 11 years ago

When I try to run the minimal example on Windows (XP, 7), I get the following error when it attempts to run GATPFit for the first species (C2H6):

ERROR: jing.chem.GATPFitException: Error writing input to GATPFit buffer
    at jing.chem.GATPFit.callGATPFit(GATPFit.java:194)
    at jing.chem.GATPFit.generateNASAThermoData(GATPFit.java:254)
    at jing.chem.Species.generateNASAThermoData(Species.java:362)
    at jing.chem.Species.<init>(Species.java:120)
    at jing.chem.Species.make(Species.java:874)
    at jing.rxnSys.ReactionModelGenerator.populateInitialStatusListWithReactiveSpecies(ReactionModelGenerator.java:4901)
    at jing.rxnSys.ReactionModelGenerator.initializeReactionSystems(ReactionModelGenerator.java:562)
    at jing.rxnSys.ReactionModelGenerator.modelGeneration(ReactionModelGenerator.java:1312)
    at RMG.main(RMG.java:96)

ERROR: jing.chem.NASAFittingException: Error in running GATPFit: jing.chem.GATPFitException: Error running GATPFit
jing.chem.GATPFitException: Error writing input to GATPFit buffer
To help diagnosis, writing GATPFit input to file GATPFit/INPUT.txt

    at jing.chem.GATPFit.generateNASAThermoData(GATPFit.java:257)
    at jing.chem.Species.generateNASAThermoData(Species.java:362)
    at jing.chem.Species.<init>(Species.java:120)
    at jing.chem.Species.make(Species.java:874)
    at jing.rxnSys.ReactionModelGenerator.populateInitialStatusListWithReactiveSpecies(ReactionModelGenerator.java:4901)
    at jing.rxnSys.ReactionModelGenerator.initializeReactionSystems(ReactionModelGenerator.java:562)
    at jing.rxnSys.ReactionModelGenerator.modelGeneration(ReactionModelGenerator.java:1312)
    at RMG.main(RMG.java:96)

CRITICAL: Error in running GATPFit: jing.chem.GATPFitException: Error running GATPFit
jing.chem.GATPFitException: Error writing input to GATPFit buffer
To help diagnosis, writing GATPFit input to file GATPFit/INPUT.txt

Running GATPFit directly from the command line and passing the saved GATPFit/INPUT.txt on stdin gives no errors, but no output either.

As far as I know, no one has tried to run RMG on Windows since the avoidfork branch was merged in https://github.com/GreenGroup/RMG-Java/pull/257. This will need to be fixed before RMG 4.0, since many to most of our end users (and some of our developers!) are Windows users.

jwallen commented 11 years ago

It appears that fame and frankie are working, in that they can be run standalone from the command-line to give the correct output. This means that it is just GATPFit that is at issue. Looking more at GATPFit, I find that it fails at the first read() statement with an end-of-file condition. It is not immediately apparent to me what is different about how GATPFit reads data from stdin, but I will keep looking.

Just to be sure, I tried with both Windows and Unix line endings, with no effect.

jwallen commented 11 years ago

Oddly enough, it seems like my antivirus/firewall was the problem all along. As soon as I turned it off, everything ran fine. However, things are still not working for Shamel. We're still looking into this.

I've written a minimal working example of a Java code that communicates with a persistent Fortran thread, based on the GATPFit and frankie implementations. I think I have some grasp of how it works now, although I'm still not sure if we're using the best set of Reader and Writer classes in the Java portion. My toy example also does not run for Shamel, so at least we seem to have isolated the problem. See https://gist.github.com/4035000 for the code.

Also, have we modified fame or DASSL to use avoidfork yet? It seems like we need to convert those in order to actually see the benefits of avoidfork from a memory doubling standpoint. However, that seems like a pretty significant change to make this close to a release. Was there a reason they weren't done before?

rwest commented 11 years ago

The original motivation (from @ramanan) was to avoid the CPU time overhead of forking, rather than to solve the memory doubling issue, so he profiled and did the slowest one first.

To avoid the memory doubling, yes, we need to avoid ALL forking. However, I set about implementing them one at a time and never got around to doing them all. I thought DASSL looked like a particularly tough nut to crack, so moved on. (One way to do it would be to have a separate thread running a "dassl server" in java, that does fork to spawn dassl jobs, but that only has a small memory footprint so doesn't double much memory) ....but certainly not this close to a release!

jwallen commented 11 years ago

We have now fixed things for Shamel, and it works for Connie on Windows as well. We're not entirely sure what fixed it for Shamel; today we installed ant and the most recent JDK (these aren't needed if you are using Eclipse, which Shamel is). Now it works even in Eclipse.

rwest commented 11 years ago

Is there any advice to add to developer installation/compilation instructions? "Please try installing Ant and the most recent JDK"? And/or "Please turn off firewalls" for user instructions?

jwallen commented 11 years ago

It looks like the advice is to (1) use the official JDK (preferably the most recent version*) to build RMG, and not rely on whatever is bundled with Eclipse, and (2) make sure all of the executables are whitelisted by your firewall so that they run without interference. I will add these to the installation instructions.

*Due to significant vulnerabilities in Java itself, not necessarily due to any issues with RMG.