OSCAAR / OSCAAR

Open Source differential photometry Code for Amateur Astronomical Research
http://oscaar.github.io
MIT License
65 stars 17 forks source link

MCMC non-zero exit status #158

Open tamorris opened 11 years ago

tamorris commented 11 years ago

The first time I tried to run MCMC on one of our computers, it returned an error. I switched computers, and it worked fine then. But now that computer is giving me (I believe) the same error. It just doesn't want to analyze this file, and I'm not sure why it's suddenly giving me issues when it's been fine up until now (except one time seemingly not producing and output file, thus basically rendering a whole run useless, but I think that might be because I forgot to type ".txt" in the name of the output file I wanted it to make...). Have I inputted something in the GUI incorrectly? Any help/suggestions are greatly appreciated. mcmcerror2

bmorris3 commented 11 years ago

Could you show the full error dialog there? It looks like you only got part of the bottom.

tamorris commented 11 years ago

Sure. I believe this is the full error, code seems to start from when I accessed the exoplanet database.

mcmc2 2

bmorris3 commented 11 years ago

Does the path to the .PKL that you're loading contain the letters \0 consecutively? It looks like it does from the first screenshot. If so, that's a Python string keyword for a NULL, which screws up the load() command. I think I know the fix for this, it should be straightforward.

bmorris3 commented 11 years ago

I can reproduce the error with this little script on my Mac:

import numpy as np
import cPickle

data = np.random.random(10)

## Simulate behavior of oscaar.IO.save()
outputName = "null\\0test.pkl"
output = open(outputName, 'wb')
cPickle.dump(data, output)
output.close()

## Simulate behavior of oscaar.IO.load()
inputPath = "null\0test.pkl"
inputFile = open(inputPath, 'rb')
data_loaded = cPickle.load(inputFile)
inputFile.close()

print "Data loaded successfully\n",data_loaded

I needed to specify the name of the file with the double slash ("null\0test.pkl") to get python to interpret the \0 as a string and not a NULL. If you look at the name of the file that's created, it's "null\0test.pkl". Then I explicitly try to open the file by that name and get the same error as you:

TypeError: file() argument 1 must be encoded string without NULL bytes, not str

I'm writing in a fix to the master branch. The only change that you need to make in order for it to work is to put an r in front of the inputPath string, which turns it into a "raw string", i.e., the \0 doesn't get interpreted as a string NULL.

tamorris commented 11 years ago

Okay. I just changed the name of the folder it was in to not have a /0 in it, and now we're MCMC-ing fine. But the fixed code will be good for future runs. Thanks!

bmorris3 commented 11 years ago

Consultation with tens of google queries hasn't yielded a fool-proof solution to this bug yet. I'm calling on outside help!

bmorris3 commented 11 years ago

Outside help's verdict is that this is a difficult challenge to overcome. He suggests that the solution should be at the GUI layer, but I can't think of a more foolproof way to do that, so I'm going to leave this as an issue for now and potentially consider a fix where catch the error and suggest to the user what went wrong. We can tell users to change the names of anything that starts with a zero if they enter one, but that seems annoying.

tamorris commented 11 years ago

The other run finished, yet I again don't see the .txt file in the place it was supposed to be saved to. As this (as I understand) largely makes the whole run fruitless, this is obviously a big problem. What's odd is that I did get the file the very first time I did MCMC, but not since then. Is there some known simple mistake I could have made the last couple of times? On another note, I've had difficulty getting two of the parameters to get good, Gaussian fits. The last run was iterated 15 million times. Is this a reasonable number to try, or should I really increase the number of iterations to, say, something like 30+ million?

Thanks,

bmorris3 commented 11 years ago

Could you show me the path that you entered, in case there's an error with that path (just like there was with the \0 stuff before)? Why don't you try running a short fits (Nsteps <= 1000) and saving the outputs to a few different paths to see if any of them save appropriately.

tamorris commented 11 years ago

Played around a bit with it, tried with a location that definitely didn't have a /0 in it, still didn't save right. Then did the same thing without adding ".txt" to the location, and that worked fine. So now I'm doing a full trial. If that goes according to plan, it will seem the problem has been isolated.

bmorris3 commented 11 years ago

That's strange, there's a check built in to protect you from that problem but it seems that you're still having it. I don't have that problem on my machine.

Was it creating files without the ".txt" extension, or not creating files at all? Can you do a small test run again without putting ".txt" at the end of the new text output file name and see if it creates any file, and also copy and paste the output in your command prompt after the MCMC stuff has run and the file should be written?

tamorris commented 10 years ago

I will do test that out tonight and let you know what happens. In other news, it seems that full trial I mentioned in my last comment did not work; I saw no file where it was supposed to be. Looking at the command prompt, it seems to have given an error at some point during the run(?). Here is a picture of what it said when I looked at it.

mcmcerror3

tamorris commented 10 years ago

Okay. After running some more trials, it seems I might have had it backwards. Not having the ".txt" extension is no good, one must put the ".txt" extension in to make a text file. However, after the first run with the .txt extension, something strange started happening. A given text file would not appear where it was supposed to until I completed another MCMC run. Quite odd. It's also possible the trial I did earlier where not put ".txt" but a file was still made was actually what happened, and that the code is behaving inconsistently. Here is the command prompt text after my last run, when the text file from the previous run appeared. -2

Then below that just a lot more output numbers and the line warning about potential overwriting the last file.

tamorris commented 10 years ago

I was able to confirm by previous suspicions, after doing a full run with 20 million steps, I was able to get the .txt output file to appear after a smaller "dummy run" afterwords. There seems to be a one-run delay in what output in produced. This is not a huge hassle, but it is interesting to consider what might be causing this. Let me know what you guys think about where to go from here on this problem.

bmorris3 commented 10 years ago

@dmikkili and I have seen this kind of behavior before, I'm kind of impressed that you diagnosed the problem so well, it's a tricky one. When is the last time that you downloaded OSCAAR? If it was a month or so ago, then you should try downloading the latest version again and seeing if the behavior continues. We noticed this behavior too and I think we fixed it. It's possible that we didn't fully fix it and that there's still work to do, but try updating OSCAAR and seeing if it still behaves the same way.

bmorris3 commented 10 years ago

@tamorris : Have you tried reinstalling OSCAAR and seeing if the error persists? I talked about it with @dmikkili and we think it's an issue that was resolved in commit SHA 2e8ff4810cb898790872c727b34c6198adcfde09.

tamorris commented 10 years ago

Sorry for the delay. I will be able to reinstall and rerun tomorrow, and will let you know what happens!

bmorris3 commented 10 years ago

Great, I look forward to seeing how it goes.

tamorris commented 10 years ago

MCMC is currently running on the pkl that's been giving me issues. No problems so far (about 20% done as of typing this). I should be able to give a final report in a few hours.