Closed atuldeshpande closed 5 years ago
One change is that the intermediate .mat
files are now stored in the HDF5 file format (MATLAB v7.3). compare_adj_matrices.py
will need to be updated to support this new format. Then this optimized code should still produce the same sparse matrices as the prior code.
Once we have the updated test code, we can recompile the MATLAB code, update the Docker image, and confirm the test cases pass.
Before merging, we'll create a copy of the input .mat
file that is used to track intermediate state on disk. Then we can add these to .gitignore
so that we do not track them in the repository.
@atuldeshpande will send instructions for re-compiling glmnet in the MATLAB 2018a environment and the new MATLAB compile command that excludes unnecessary toolboxes.
https://pythonhosted.org/hdf5storage/introduction.html#convenience-functions-for-matlab-mat-files looks like the perfect solution for loading the new mat
files. I'll update test code as part of this pull request.
@atuldeshpande did the filenames of the intermediate sparse matrices change as well? I now see AdjMatrix_data1_X_SCODE_datapmat_ID_541_lambda_0p01_replicate_1.mat
and similar files in the output from the latest version.
@atuldeshpande did the filenames of the intermediate sparse matrices change as well? I now see
AdjMatrix_data1_X_SCODE_datapmat_ID_541_lambda_0p01_replicate_1.mat
and similar files in the output from the latest version.
Yes. This was to ensure that we create multiple outputs when multiple lambdas are being used.
The tests pass and this looks good to me. @atuldeshpande do you want to review one last time and then we can merge?
Looks good to me.
First round of optimizations. Includes: New input data formats for lower disk usage. Ability to input multiple lambdas and save multiple output files, one for each lambda. Precompute the Full Kernel (saves computational resources). Introduced some code to allow for later incorporation of regulators identified as a subset of all genes.
Note: These changes do not impact the intermediate outputs computation or the aggregate SCINGE scores. The changes to the output mat files may create a need to revisit the python scripts read function during the Travis tests.