Closed tvandera closed 7 years ago
There are quite a few branches in smurff project. Can we get rid of them or what should we do with them? Should we merge them into smurff/master? Currently we are thinking of getting the latest available changes from macau/master to smurff/master. Then merging smurff/smurff branch into smurff/master branch and that's will be it. We'll be left with master branch only.
That seems like a good plan. Won't be easy though since much of the tensor support that is in jaak-s/macau/master is not ExaScience/smurff/smurff. Hence
But if you do this you also would have done #13 and #6 :)
Cheers, Tom
There are several branches that were not merged into any other branch:
What should we do with them? Merge them into smurff or master?
Macau_vb and probit are feature branches from the original jaak-s/macau
repo. These can be removed.
@jaak-s Great, got it, thanks
@tvandera And what about branches that were merged? Can we delete smurff/tensor
, smurff/precision_sampling
and smurff/gfa
branches?
smurff/tensor
and smurff/precision_sampling
are also branches from original git repo. Feel free to delete. I think @tvandera can comment whether smurff/gfa
is needed.
smurff/gfa has been merged into smurff/smurff. You can delete smurff/gfa.
Just a current status update.
We have removed all branches that you said are safe to remove. So currently we have only 4 branches left.
master
smurff
stable
master-smurff-merge
macau/master
is now merged into smurff/master
.
smurff/master
is now merged into smurff/smurff
. The merge commit itself and all adjustments are in temporary master-smurff-merge
branch. So technically smurff/master
changes are not in smurff/smurff
branch. So all changes from smurff/master
and smurff/smurff
are merged into master-smurff-merge
. We will then merge master-smurff-merge
branch back into smurff
and master
.
We merged it without all new functionality, cause there were lots of conflicts and modifications. We just left smurff code as it is and placed macau code just beside it in the comments. Now we need to think how to move all the Jaak's modifications from macau to smurff.
We have several problem files. Some of them were considered as new by git, some of them just had merging conflicts. So we wrote them down and found all Jaak's commits which modified these files.
latentprior.h da9f505c3fe9368629ac6a0665add7786cc1a1e7 03e5cd81d26662f95a5f57fbc0e24b9707fb7415 bd469d0a79fda2a0d223530b659b8adc08c8da46 2c3a461d9c5c349d6246b2c3dc89fd5d53d08e11 61ccc734d274a1cb54f3426b9fbe630d763e91fc 6446512a44a332d8c3913b9bba53100c58965bbf
macauoneprior.h bd469d0a79fda2a0d223530b659b8adc08c8da46 2c3a461d9c5c349d6246b2c3dc89fd5d53d08e11
macauoneprior.cpp bd469d0a79fda2a0d223530b659b8adc08c8da46 2c3a461d9c5c349d6246b2c3dc89fd5d53d08e11 5207e763c465f69fd65e962aa59f6c6779946e89 0f077847efc603b294d7c3ea74611510c2b5a066
mvnormal.cpp b9cf1d8faa3ac807332d7c768dee5ec459921845 0a90e37298a3ff55e555cf6eb4032e4739d9efa5
noisemodels.h, noisemodels.cpp da9f505c3fe9368629ac6a0665add7786cc1a1e7 48757efe50cfdc58a2b4d8a11cab318f9efb1707 2a2e120492f8ea32b8f948be5a6eba46e3c454d3 03e5cd81d26662f95a5f57fbc0e24b9707fb7415 bd469d0a79fda2a0d223530b659b8adc08c8da46 2c3a461d9c5c349d6246b2c3dc89fd5d53d08e11 233b55c903335997296f6635911c66961dd8ba94 61ccc734d274a1cb54f3426b9fbe630d763e91fc
tests.cpp db8bec003a084f0122ac6dd59dec140518ff440d d9e039c8e72e097283a8c411c5fe72a8a9d9a832 a5d017e8ae5359942642d07219589689205ee672 a6ef8a00a6f52a17aef8121dff38f932e6e0afc2 6d9eef16345422d0b66ce196dc33a54cdc7b398f 2c3a461d9c5c349d6246b2c3dc89fd5d53d08e11 3984a2e3282b0532a082faa28866fd00805df621 4624ece170f6540713ec7dace8772b9431772700 af8106ad3258017697bb6c8ec6935f494b00b564 f3167505401f802de780dd92860864456859927b 494e9b133bcb9ce113416e0b5bdf509ef2bb28f3 5207e763c465f69fd65e962aa59f6c6779946e89 29adccb1f5cc419ed411d06a8682898d513cac88 ef390c9b8c9ea6828af527c4de4e565cf5e1fb1f 6446512a44a332d8c3913b9bba53100c58965bbf c2a67439879268714594aa12b5dc7dd178b29c2b 0a90e37298a3ff55e555cf6eb4032e4739d9efa5
Here's a first go on how these commits should be handled:
These are related to #6 on tensors and should be ported manually to smurff: da9f505 03e5cd8 bd469d0 2c3a461 61ccc73 5207e76 0f07784 494e9b1 48757ef 2a2e120 233b55c 61ccc73
@ipasechnikov commit: 47cd8d7
Dense side info python interface, probably can be categorized under #7: 6446512
I will create an issue for these: b9cf1d8 (wishart fix) 0a90e37 (trunc normal)
b9cf1d8 (wishart fix): #14 0a90e37 (trunc normal): #15
@tvandera Great, thank you
I'm sorry, must have mistaken. That's my commit. My bad. I should not include it in the list
@ipasechnikov commit: 47cd8d7
These are files that were created after merge. Some of them must be merged. Others can be irrelevant. For example latentprior.cpp
functionality moved to different files after fork.
bpmfutils.cpp
bmpfutils.h
latentprior.cpp
macauoneprior.cpp
sparsetensor.cpp
sparsetensor.h
We should also consider moving files from latest commits
truncnorm.h
truncnorm.cpp
inv_norm_cdf.h
inv_norm_cdf.cpp
We should also consider moving files from latest commits
truncnorm.h
truncnorm.cpp
inv_norm_cdf.h
inv_norm_cdf.cpp
These files were moved in c50954da07075c41e94c1e2420b4711a5d1f1471 commit in issue #15.
@jaak-s, do I understand correctly that sample_latent_blas
is a newer or better version of sample_latent
? Cause both functions are similar except for these lines:
sample_latent_blas
MM.triangularView<Eigen::Lower>() += alpha * col * col.transpose();
sample_latent
MM.noalias() += col * col.transpose();
@tvandera, if that's the only difference, should we implement this line in Data::get_pnm
. ScarceBinaryMatrixData::get_pnm
is the same as part of sample_latent
. So at least its method can be updated with this single line from sample_latent_blas
method.
Yes, this is the biggest difference and also gives good performance improvement. So this line should be incorporated into the smurff branch.
Note that there are some other minor differences:
MM
is immediately initialized to Lambda_u
Eigen::LLT<MatrixXd> chol = MM.llt();
, because alpha
is already included in MM
.@jaak-s , what is more, it seems that ProbitNoise
noise variable is not used in sample_latents
. Seems like it's used only to distinguish methods based on noise. For other types of noises, noise is used to get its alpha
variable.
Do you mean function sample_latent_blas_probit
? The sampling procedure is different for ProbitNoise model
and yes, in case of ProbitNoise
there is no alpha
parameter.
@jaak-s. yeah, I mean sample_latent_blas_probit
, but it's invoked in BPMFPrior::sample_latents
and MacauPrior<FType>::sample_latents
, both of them have a noise argument but it's not used to pass any parameters from noise
to sample_latent_blas_probit
.
Yes, that it is the case, ProbitNoise
model does not have parameters, contrary to other noise models.
@tvandera, as for smurff, it seems that noise is only used in SpikeAndSlabPrior
and MacauOnePrior
. As for macau, it seems that noise is used in every sample_latents
method except for the ones that use ProbitNoise
. Do I understand everything correctly?
We will use latest version of noisemodels.h, noisemodels.cpp for merge Clases to merge: INoiseModel FixedGaussianNoise AdaptiveGaussianNoise ProbitNoise
We will consider latest version of latentprior.h for merge
We need to remove latentprior.cpp because functionality has moved to different classes. We will use latest version of latentprior.cpp for merge latentprior.cpp is the only file where we don't have comments with diff.
Need to check how noise is passed.
Clases to merge: ILatentPrior NormalPrior MacauOnePrior MacauPrior
We have new implementations which will be not considered during merge: SpikeAndSlabPrior MPIMacauPrior
We will use latest version of macauoneprior.h, macauoneprior.cpp for merge
We will use latest version of bmpfutils.h, bpmfutils.cpp for merge with utils.h
We don't have this functionality in smurff. We need to completely add this
We will use latest version of tests.cpp for merge
Mostly a note for @motoharu-yano and me not to forget anything. But please feel free to correct me, if I'm missing or misunderstanding something.
AdaptiveGaussianNoise::init
in macau is now split in smurff. Part of it is still in AdaptiveGaussianNoise::init
and part of it in Data::var_total
. At least MatrixDataTempl<SparseMatrixD>::var_total
is the same as in macau. Mostly the same, there is for some reason no mean_value
. Maybe it somehow has already been included.
Same story for AdaptiveGaussianNoise::update
. Part of it is still in AdaptiveGaussianNoise::update
and part of it in Data::sumsq
.
MatrixDataTempl<SparseMatrixD>::sumsq
looks mostly the same as corresponding code in AdaptiveGaussianNoise::update
in macau.
// macau
double Yhat = Vj.dot( U.col(it.row()) ) + mean_value;
// smurff
double Yhat = model.dot({i,j}) + offset_to_mean({i,j});
So it seems that smurff uses offset_to_mean
and not mean_value
. I suppose these are the same things. And minor difference in calculating dot product. But I don't think that it matters.
This is the only method that is in macau, but not in smurff. It has totally changed in smurff.
We may consider smurff's Session::run
as macau's INoiseModel::evalModel
. Or at least consider INoiseModel::evalModel
being a part of Session::run
. Cause Session::run
is macau's MacauX<DType, NType>::run
. All that eval_rmse
part in AdaptiveGaussianNoise::evalModel
is smurff's Result::update
that is invoked in Session::printStatus
. And Session::printStatus
itself is invoked at every iteration in Session::step
.
It differs a lot from others evalModel
methods. And I can't find any corresponding code in smurff. Where has it moved? It just handled as any other noises now? Like AdaptiveGaussianNoise
and FixedGaussianNoise
.
Summing everything up, we may consider that all noise functionality from macau is in smurff, except for tensors of course. Cause all these methods are overloaded for TensorData
in macau. And as for smurff, we don't have TensorData
yet.
And also an above stated ProbitNoise::evalModel
issue. What should we do with it? Can we consider that all its functionality is in smurff?
I have partially fixed step 3 of merge plan in commit ee404a2ed299f28a359122b3c590b82ef808b914. This is not fully done yet. See below:
moved to utils - we already have this code and code is the same -tick -clamp -getMinMax -split_work_mpi -square -row_mean_var -to_string_with_precision
moved to matrix_io - we already have this code and code is the same -writeToCSVfile
moved to sparsetensor.cpp - this code still has to be integrated -sparseFromIJV -sparseFromIJV
moved to result -eval_rmse - this became result::update method - looks like nearly identical -eval_rmse_tensor - still has to be integrated since we do not have tensors -auc - this became update_auc method but there are some differences and we need to verify
In the end - we still have to integrate tensors, so I moved corresponding functions to sparsetensor.cpp. There are also some inconsistencies with auc update also result::update tensor code was moved to result.cpp to be integrated later.
I have partially fixed step 1 of merge plan in commit 0ce1ba44714cac4e1205a15ffaa85d0e1c55bce7. Classes were split in separate files. Also some code moving happened. This is not fully done yet. See below: data.cpp -var_total and sumsq code that was nearly similar was further moved from noise models for comparison. -var_total and sumsq code for tensor was moved for further integration
result.cpp -probit noise has specific implementation of eval_rmse that was moved for further comparison/integration.
Here are some further questions for discussion about Probit (Copied from skype).
I was looking at the noise models and how they have changed from macau to smurff. Found couple of things I want to question about.
Previously noise models had evalModel method (which was doing eval_rmse). That method has now moved to Result::update. One of evalModel implementations was specific for ProbitNoise. It was using/calculating normal cdf. And it was also calculating AUC.
Looking at implementation in Result::update - i see no references to cdf. I is this OK? There is also update_auc method that is called, but it always calculates auc without checking the type of noise. Previously this was Probit specific. Is this also OK?
Regarding AUC itself. I have compared the code. To me it looks like old and new implementations of AUC are different, but I am not sure. Are we OK with new implementation? The other thing is that it looks like old implementation of AUC was using some permutation algo. New implementation is not. Are we OK with that?
In the latest commit c951916118ecb06f935566a2e176897962fc8ee3 I split all prior implementations into separate h/cpp files and added old version of code from macau for comparison (according to step 2) Now it will be much easier to review/merge and work with the code.
I have also found these inconsistencies: class ILatentPrior void add(BaseSession &b); - missing implementation
class MPIMacauPrior MPIMacauPrior(BaseSession &m, int p); - missing implementation std::ostream &info(std::ostream &os, std::string indent) override; - missing implementation MPIMacauPrior(ScarceMatrixData &m, int p, INoiseModel &n); - missing declaration
I have reviewed the priors code and here is the summary:
GlobalPrior -- I have moved here 3 old methods (temporarily). sample_latent_blas_probit, sample_latent_tensor, make_dense_prior.
ILatentPrior -- I would suggest we keep update_prior method -- update_pnm in sample_latents. Not clear where did it come from.
NormalPrior -- keep update_prior method -- review init method. some new initialization there -- sample_latents - CondNormalWishart takes more args -- compare sample_latent to old sample_latent and sample_latent_blas
MacauOnePrior -- keep update_prior method -- use init method instead of constructor as it is done in all other priors -- compare sample_latent to old sample_latents. I have concerns about sample_latent implementation.
MacauPrior -- maybe make sample_beta method virtual ? -- keep update_prior method -- I have concerns about sample_latents. To be precise - about update_prior functionality. Looks like it is called two times because of inheritance. This did not happen in old code. is this a bug ?
MPIMacauPrior -- use init method? -- keep update_prior method (if needed)
Pseudocode explanation about possible update_prior issue:
new code
void MacauPrior
old code
void MacauPrior
void MacauPrior
Merge was finished. Everything else that is still has to be done was moved to separate issues.
After 0.7.0 release