Closed Gregory94 closed 4 years ago
The Matlab code from the Kornmann lab for processing the number of transposons and reads for each possible insertion site uses specific functions. These functions are not all present in Python. For example, Matlab uses the Biomap
function to read a .bam file. However, implementing a similar function in Python does not seem to be trivial. There is a package called pysam
that might be able to read .bam files, but this is not working on Windows machines. I will check if this will work in Linux (which should run the code in the end anyway) and see if this package gives the desired results.
I will also check if it would be feasible to escape to another programming language that might be able to do the analysis of the .bam file.
The progress on translating the Matlab code from the Kornmann lab to python can be found here. The pysam function seems to work pretty well and the python code has very similar results compared to the Matlab code. There are some differences, for example due to the fact that some python functions work slightly different compared to their Matlab equivalents. Also, the Python code uses some different files for finding (essential) genes that are not perfectly identical to the ones used by the Kornmann lab. Tests need to prove if these differences are acceptable. When the tests prove successful, this code will be changed to a more organized python code and some improvements in terms of efficiency and speed will be made.
The python code for transposon mapping is finished and integrated into the workflow for satay analysis. The code can be found here and is now also present in the virtual machine. I solved all the issues that were present in the matlab code by Benoit and the code creates some additional files that might be helpful for our research.
Using Python makes it easier to integrate the code in the rest of the workflow. Also, this makes documentation easier using for example Jupyter Notebooks or Jupyter Books.