mathematicalmichael commented 4 years ago

tested multiple processors
unlocks scipy
uses pickling instead of scipy.io
old .mat files aren't quite loaded properly, need to change their extensions to have none (if your scripts coded the saving/loading paths like mine did).
- this can be addressed with a simple check for file ending when loading. I think there's existing logic to fall back on no extension if .mat is not found, but we need to add in the vice-versa logic. if .mat is missing, try loading without. should open new issue for it. may not even be a real problem
some hotfixes I found in comparison module got thrown in as well. sorry they're not separated out. it wasn't anything functional. just efficiency. I noticed that it took too long to compute metrics when loading files, it was because the pointer-loading was mis-coded. tests didn't catch it because the pointers did get built, just took more time. I caught it when expecting blazing fast metric computations on pre-computed pointers for saved files.
sample-based methods!
I've been building up examples in a separate repo, here: https://github.com/mathematicalmichael/thesis/tree/master/examples
- uses BOTH set and sample-based.
- command-line modules with documentation, argument parsing
- model agnostic. swap them out.
- still work in progress, so data-driven methods examples haven't been included yet
- that last point is moot for this PR, since these examples focus on comparing set/sample based methods for the same inverse problems, so these show that I've validated (with lots of plotting) the comparisons, caught/fixed bugs as I was running diverse examples, threw fixes right into the sample branch that got merged into develop
- that's all to say.. this PR's content has been subject to extensive functional testing
docstrings still somewhat weak at documenting the arguments, but that's okay for the time being. I'll be adding to them. Really check out the thesis examples, they'll demonstrate usage in the cleanest way.

See https://github.com/mathematicalmichael/BET/pull/30 Squash merge

adding TEMPORARY files for CI.
adding support for more types of kwd args.
added support for tuple and list.
linting and io ptr setting.
resiliency to overwriting observed distribution.
choose outputs method
linting.
whitespace.
fix tests with sample.
SET_IO PTR REMOVED FROM PROB
clean up lines.
scaffolding for distributions
Revert "clean up lines."

This reverts commit 4db634773ff61b971af199e3acf623d6e4cb59de.

Revert "SET_IO PTR REMOVED FROM PROB"

This reverts commit 40a7267c0f994dbf52533017f3e4968ef001216e.

Revert "Revert "SET_IO PTR REMOVED FROM PROB""

This reverts commit 862f70b87310339027dd5aec97449996d9121a41.

fixed test.
Revert "Revert "clean up lines.""

This reverts commit 845e77f3f4ce81857e69b1cdd293993c031ccfa5.

domain inference and rvs support
support for kde rvs
typo.
self missing
whitespace.
ANSATZ CHANGE IN PROB
hasattr fix.
typo caused tests to fail.
pdf and cdf now handle KDE object.
MC estimate for probabilities.
TESTS PASS FOR EMULATED PROBS.
silence the hounds.
travis simplified down.
passing tests and several added functions.
tests passing, WIP
WIP
tests passing and whole ton of code added.
linting
BUGS FIXED FOR SAMPLING.
linting.
minor bugfixes.
linting.
bug fixes.
bug fixes and architecture changes
scipy version lock.
sampling tests begun.
linting.
manual linting.
docstrings.
local overwriting tests now pass.
WIP. NEEDS REFACTOR.
TESTS PASSING FINALLY.
removed redundant method.
test passing and inverse problem being solved correctly.
changes so tests pass in higher dimensions!
no longer does setting the observed overwrite the reference value. must set it explictly.
tests failing for dimensions both == 1.
tests now pass in all the dimensions.
nonrepeated data-driven mode first passing test.
caught typo.
tests now passing, likelihood working properly.
travis file matplotlib re-added for 2.7 to pass, re-aligned setup file for comparison class that is being added to master.
cover.
linting.
scipy import for (still untested) noise-setting.
resolving some hound-found issues. need more tests for loss function.
moved line incorrectly.
linting.
made std test more robust to sizes.
removed non-ASCII character from test docs so that tests would pass in python 2.7
better coverage and tests.
coverage up to 77
linting.
to-do note.
linting.
fix bug and more linting.
fixed failing tests due to typos.
fixed broken tests and improved coverage ever so slightly.
repeated tests pass.
copyright notice.
addressing some linting issues.
line overflows.
info on data driven status from predited.
linting.
linting.
fix unused variables.
typo.
fixed typo and some line overflows.
adding reference value tests.
autopep linting.
dimensions fix in basicSampling test.
WIP
added successful test for add_data
fix inheritance tests.
passing tests for adding data with reference parameter
passing tests.
fix typo.
attempting to fix broken test.
setup incremented and re-styled to match comparison branch.
fixed some coverage and deprecation.
linting.
fixed behavior of set_data
linting.
adding data tests.
another test for calcP.
attempt to address coverage in calculateError.
added methods to aid compatibility with old approach.
coverage for slicing functions.
more tests.
linting.
attempt to fix broken build.
likelihood test revision.
set predicted for likelihood.
refactor setting data.
removed test for calculateError
fixed parallel generation of samples.
remove pushforward approximation in test.
revert test.
serial now loads distribution information.
distribution loading and saving now works in mat files both in parallel and serial
refactored set_initial
linting and cleanup.
fixed test by using appropriate default value.
fixed backward incompatible change.
vim files added to gitignore.
fixed repeated sampling test to take any length data.
linting.
more aggressive linting.
linting (manually)
linting (autopep)
sed replacement for loggin.warn into warning.
linting after warn.
full sweep autolinting.
autolint
re-added disc.local_to_global. all tests except sample.py pass
more tests pass but some still fail.
copy setup.
overwrite den with None when generating samples.
accept/reject WIP.
typo.
linting.
travis testing temporarily.
stylistic changes.
WIP on parallel pdf.
adding test file.
small changes.
kde tests WIP
fixed tests.
autopep linting.
travis.
test commit
linting, comments.
STABLE. Tests Pass in parallel. needs cleanup.
reincluded test.
removed extraneous file added by accident.
autolinting.
whitespace.
line indent.
whitespacing and typos
silence the hounds. manual linting.
manual linting.
whitespace.
autopep linting.
Sample pickle (#31)
pickling.
unlocked scipy version once pickle replaced sio.load/savemat
adding back in passing tests.
cleanup.
dont skip test now.
manual linting.
manual linting.
manual linting.
manual linting.
drop 2.7 tests.
try to get travis passing.
fix some testing bugs.
found random sampling ignored domain in bsam.
typo.
hotfix for compare
bugfix comparison.
hotfix.

codecov[bot] commented 4 years ago

Codecov Report

:exclamation: No coverage uploaded for pull request base (develop@2eff2b8). Click here to learn what that means. The diff coverage is 77.01%.

@@            Coverage Diff             @@
##             develop     #369   +/-   ##
==========================================
  Coverage           ?   79.82%           
==========================================
  Files              ?       23           
  Lines              ?     5339           
  Branches           ?        0           
==========================================
  Hits               ?     4262           
  Misses             ?     1077           
  Partials           ?        0

Impacted Files	Coverage Δ
bet/util.py	`71.6% <ø> (ø)`
bet/postProcess/compareP.py	`95.41% <100%> (ø)`
bet/calculateP/simpleFunP.py	`67.45% <100%> (ø)`
bet/sampling/adaptiveSampling.py	`83.87% <50%> (ø)`
bet/sampling/basicSampling.py	`80.95% <72.22%> (ø)`
bet/sample.py	`79.58% <77.24%> (ø)`
bet/calculateP/calculateP.py	`88.27% <85.71%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 2eff2b8...bfa9c34. Read the comment docs.

mathematicalmichael commented 4 years ago

here is a docker image that builds my sample branch (which is currently in-line with my develop) from source on top of fenics: docker pull mathematicalmichael/python:thesis (2.03GB)

smattis commented 4 years ago

@mathematicalmichael FYI. I'm still trying to wrap my head around everything that is happening here, but will try to be done with a full initial review tomorrow.

smattis commented 4 years ago

There needs to be a pretty major restructuring. Right now all of the data consistent and iterated stuff is included directly in sample.py. That module is for defining the basic data structures of sample_set and discretization. The methods of defining distributions on these objects should be here.

However, right now all of the data-consistent and iterated algorithms are part of the discretization object here. These should be in their own modules in the calculateP directory and take discretization objects as inputs. That is what we have done with the other methods. sample.py is way too hard to deal with in this form, and it is not how it is intended to be used. This restructuring will also have a big effect on the tests as well.

mathematicalmichael commented 4 years ago

There needs to be a pretty major restructuring. Right now all of the data consistent and iterated stuff is included directly in sample.py. That module is for defining the basic data structures of sample_set and discretization. The methods of defining distributions on these objects should be here.

However, right now all of the data-consistent and iterated algorithms are part of the discretization object here. These should be in their own modules in the calculateP directory and take discretization objects as inputs. That is what we have done with the other methods. sample.py is way too hard to deal with in this form, and it is not how it is intended to be used. This restructuring will also have a big effect on the tests as well.

I do understand what you're saying, and I made the decision to include solution methods here purposefully. Especially when it came to evaluating the updated density, this architecture dramatically simplifies things. This makes iteration and defining observed densities part of the attributes that belong to the discretization object, rather than having a separate method act on them and adding attributes to the class. When I sketched out how it could possibly work through calculateP, it was far less flexible. It's been a while since I made those design decisions, I'll admit... but I definitely struggled through them before going this route. It was my second go at it, afterall. I more/less completely refactored consistentbayes in making these changes, and that repo did actually segment out the solver into a separate module much like you're proposing.

Do you really think it's that abysmal a change?

smattis commented 4 years ago

There needs to be a pretty major restructuring. Right now all of the data consistent and iterated stuff is included directly in sample.py. That module is for defining the basic data structures of sample_set and discretization. The methods of defining distributions on these objects should be here. However, right now all of the data-consistent and iterated algorithms are part of the discretization object here. These should be in their own modules in the calculateP directory and take discretization objects as inputs. That is what we have done with the other methods. sample.py is way too hard to deal with in this form, and it is not how it is intended to be used. This restructuring will also have a big effect on the tests as well.

I do understand what you're saying, and I made the decision to include solution methods here purposefully. Especially when it came to evaluating the updated density, this architecture dramatically simplifies things. This makes iteration and defining observed densities part of the attributes that belong to the discretization object, rather than having a separate method act on them and adding attributes to the class. When I sketched out how it could possibly work through calculateP, it was far less flexible. It's been a while since I made those design decisions, I'll admit... but I definitely struggled through them before going this route. It was my second go at it, afterall. I more/less completely refactored consistentbayes in making these changes, and that repo did actually segment out the solver into a separate module much like you're proposing.

Do you really think it's that abysmal a change?

I think including everything in the discretization object in sample.py kind of goes against the object-oriented framework that we are using. Why not use the "old" discretization be a base class that all of this new stuff inherits from. So, I'm not saying you have to put everything in functions that act on a discretization object, but in the very least most of this could go in another file that builds upon the discretization base class defined here. Right now, opening up the sample.py file is incredibly daunting. There's simply too much in this file.

smattis commented 4 years ago

@mathematicalmichael I am getting on an airplane soon and will be in Austin working with Troy and Clint this week. If we can arrange a call sometime then that would be good. If you look at my above comment, it might not be as much restructuring as you are assuming.

mathematicalmichael commented 4 years ago

There needs to be a pretty major restructuring. Right now all of the data consistent and iterated stuff is included directly in sample.py. That module is for defining the basic data structures of sample_set and discretization. The methods of defining distributions on these objects should be here. However, right now all of the data-consistent and iterated algorithms are part of the discretization object here. These should be in their own modules in the calculateP directory and take discretization objects as inputs. That is what we have done with the other methods. sample.py is way too hard to deal with in this form, and it is not how it is intended to be used. This restructuring will also have a big effect on the tests as well.

I do understand what you're saying, and I made the decision to include solution methods here purposefully. Especially when it came to evaluating the updated density, this architecture dramatically simplifies things. This makes iteration and defining observed densities part of the attributes that belong to the discretization object, rather than having a separate method act on them and adding attributes to the class. When I sketched out how it could possibly work through calculateP, it was far less flexible. It's been a while since I made those design decisions, I'll admit... but I definitely struggled through them before going this route. It was my second go at it, afterall. I more/less completely refactored consistentbayes in making these changes, and that repo did actually segment out the solver into a separate module much like you're proposing. Do you really think it's that abysmal a change?

I think including everything in the discretization object in sample.py kind of goes against the object-oriented framework that we are using. Why not use the "old" discretization be a base class that all of this new stuff inherits from. So, I'm not saying you have to put everything in functions that act on a discretization object, but in the very least most of this could go in another file that builds upon the discretization base class defined here. Right now, opening up the sample.py file is incredibly daunting. There's simply too much in this file.

that's a good point. Are you thinking that I can keep the distributions attribute of the sample_set class, but move most of the diff of discretization into a separate module? That seems reasonable. Is the idea that all I would have to do is swap my import statements in any new examples... like import bet.sample as samp to import bet.directsample as samp?

smattis commented 4 years ago

There needs to be a pretty major restructuring. Right now all of the data consistent and iterated stuff is included directly in sample.py. That module is for defining the basic data structures of sample_set and discretization. The methods of defining distributions on these objects should be here. However, right now all of the data-consistent and iterated algorithms are part of the discretization object here. These should be in their own modules in the calculateP directory and take discretization objects as inputs. That is what we have done with the other methods. sample.py is way too hard to deal with in this form, and it is not how it is intended to be used. This restructuring will also have a big effect on the tests as well.

I do understand what you're saying, and I made the decision to include solution methods here purposefully. Especially when it came to evaluating the updated density, this architecture dramatically simplifies things. This makes iteration and defining observed densities part of the attributes that belong to the discretization object, rather than having a separate method act on them and adding attributes to the class. When I sketched out how it could possibly work through calculateP, it was far less flexible. It's been a while since I made those design decisions, I'll admit... but I definitely struggled through them before going this route. It was my second go at it, afterall. I more/less completely refactored consistentbayes in making these changes, and that repo did actually segment out the solver into a separate module much like you're proposing. Do you really think it's that abysmal a change?

I think including everything in the discretization object in sample.py kind of goes against the object-oriented framework that we are using. Why not use the "old" discretization be a base class that all of this new stuff inherits from. So, I'm not saying you have to put everything in functions that act on a discretization object, but in the very least most of this could go in another file that builds upon the discretization base class defined here. Right now, opening up the sample.py file is incredibly daunting. There's simply too much in this file.

that's a good point. Are you thinking that I can keep the distributions attribute of the sample_set class, but move most of the diff into a separate module? That seems reasonable. Is the idea that all I would have to do is swap my import statements in any new examples... like import bet.sample as samp to import bet.directsample as samp?

Yes. Something like that sounds great.

mathematicalmichael commented 4 years ago

closing since most of this is implemented in #382 , will build on top of that when it's merged to re-add functionality.

UT-CHG / BET

Sample-based methods #369

See https://github.com/mathematicalmichael/BET/pull/30 Squash merge

Codecov Report