SATAY-LL / LaanLab-SATAY-DataAnalysis

This contains codes and workflows for data analysis regarding SATAY experiments.
Apache License 2.0
4 stars 3 forks source link

Convert Matlab code provided by the Kornmann lab for SATAY analysis to Python. #14

Closed Gregory94 closed 4 years ago

Gregory94 commented 4 years ago

Using Python makes it easier to integrate the code in the rest of the workflow. Also, this makes documentation easier using for example Jupyter Notebooks or Jupyter Books.

Gregory94 commented 4 years ago

The Matlab code from the Kornmann lab for processing the number of transposons and reads for each possible insertion site uses specific functions. These functions are not all present in Python. For example, Matlab uses the Biomap function to read a .bam file. However, implementing a similar function in Python does not seem to be trivial. There is a package called pysam that might be able to read .bam files, but this is not working on Windows machines. I will check if this will work in Linux (which should run the code in the end anyway) and see if this package gives the desired results. I will also check if it would be feasible to escape to another programming language that might be able to do the analysis of the .bam file.

Gregory94 commented 4 years ago

The progress on translating the Matlab code from the Kornmann lab to python can be found here. The pysam function seems to work pretty well and the python code has very similar results compared to the Matlab code. There are some differences, for example due to the fact that some python functions work slightly different compared to their Matlab equivalents. Also, the Python code uses some different files for finding (essential) genes that are not perfectly identical to the ones used by the Kornmann lab. Tests need to prove if these differences are acceptable. When the tests prove successful, this code will be changed to a more organized python code and some improvements in terms of efficiency and speed will be made.

Gregory94 commented 4 years ago

The python code for transposon mapping is finished and integrated into the workflow for satay analysis. The code can be found here and is now also present in the virtual machine. I solved all the issues that were present in the matlab code by Benoit and the code creates some additional files that might be helpful for our research.