gitter-lab / SINGE

Gene regulatory network reconstruction from pseudotemporal single-cell gene expression data
MIT License
11 stars 6 forks source link

Optimize #28

Closed atuldeshpande closed 5 years ago

atuldeshpande commented 5 years ago

First round of optimizations. Includes: New input data formats for lower disk usage. Ability to input multiple lambdas and save multiple output files, one for each lambda. Precompute the Full Kernel (saves computational resources). Introduced some code to allow for later incorporation of regulators identified as a subset of all genes.

Note: These changes do not impact the intermediate outputs computation or the aggregate SCINGE scores. The changes to the output mat files may create a need to revisit the python scripts read function during the Travis tests.

agitter commented 5 years ago

One change is that the intermediate .mat files are now stored in the HDF5 file format (MATLAB v7.3). compare_adj_matrices.py will need to be updated to support this new format. Then this optimized code should still produce the same sparse matrices as the prior code.

Once we have the updated test code, we can recompile the MATLAB code, update the Docker image, and confirm the test cases pass.

Before merging, we'll create a copy of the input .mat file that is used to track intermediate state on disk. Then we can add these to .gitignore so that we do not track them in the repository.

@atuldeshpande will send instructions for re-compiling glmnet in the MATLAB 2018a environment and the new MATLAB compile command that excludes unnecessary toolboxes.

agitter commented 5 years ago

https://pythonhosted.org/hdf5storage/introduction.html#convenience-functions-for-matlab-mat-files looks like the perfect solution for loading the new mat files. I'll update test code as part of this pull request.

agitter commented 5 years ago

@atuldeshpande did the filenames of the intermediate sparse matrices change as well? I now see AdjMatrix_data1_X_SCODE_datapmat_ID_541_lambda_0p01_replicate_1.mat and similar files in the output from the latest version.

atuldeshpande commented 5 years ago

@atuldeshpande did the filenames of the intermediate sparse matrices change as well? I now see AdjMatrix_data1_X_SCODE_datapmat_ID_541_lambda_0p01_replicate_1.mat and similar files in the output from the latest version.

Yes. This was to ensure that we create multiple outputs when multiple lambdas are being used.

atuldeshpande commented 5 years ago

The tests pass and this looks good to me. @atuldeshpande do you want to review one last time and then we can merge?

Looks good to me.