SATAY-LL / LaanLab-SATAY-DataAnalysis

This contains codes and workflows for data analysis regarding SATAY experiments.
Apache License 2.0
4 stars 3 forks source link

Improving the processing steps for the sequencing data. #9

Closed Gregory94 closed 4 years ago

Gregory94 commented 4 years ago

The python scripts that I made so far, I only used them together with a dataset provided and processed by the Kornmann lab. These scripts allows me to do some initial checking how well the datasets agree with out current knowledge (e.g. which genes are annotated essential or not) and how well the scripts work. Now I want to use these scripts to check the same dataset from the Kornmann lab that I processed myself and see how different their and mine processing routines are. Using this information I can tweak and finetune my processing steps.

Gregory94 commented 4 years ago

I made some python scripts that generate figures and a text file that allows me to do a quick visual comparison of two datasets (see the issue 'Create visual and numerical comparison for two datasets.'). Currently I am reprocessing datasets and checking these figures to check for any improvements.

Gregory94 commented 4 years ago

A proposed method is looking at the transposon insertions near telomeres in each chromosome.

Gregory94 commented 4 years ago

One of the main steps in the processing is trimming of the sequences, for example cutting the adapter sequences and reads of low quality. Until now I have been Trimmomatic for this, but this seems a bit limiting in terms of settings and options. A better option might be BBduk (part of the BBMap suite), that is promised to be faster and more reliable. At least it seems to have more options available that allows for finer tuning of the settings.

leilaicruz commented 4 years ago

Are you able to use this new tool (BBduk?) for free?

Gregory94 commented 4 years ago

Yes, this is a free and open source software package. Apparently you can also do sequences alignment with it, but that I still have to figure out.

Gregory94 commented 4 years ago
  1. Data trimming using BBDuk now works. I still have to optimize the steps for trimming and alignment, but I seem to have some more options to play with.

  2. I made the processing workflow more efficient. Everything is now performed in Linux using a virtual machine (except some matlab codes at the end of the processing workflow). This might allow other users to simply copy the virtual hard drive of my virtual machine and use that to set up their own virtual machine. If this works, then the other users don't have to setup the virtual machine from scratch and download and install any software. Simply copy my virtual machine would be enough to get everything working, making the process much quicker and more user friendly. (This has to be tested though).

  3. The documentation (notes and installation guide) for the processing is updated according to the updated workflow.

leilaicruz commented 4 years ago

Nice! we should to a test session before you leave, for sure!

Gregory94 commented 4 years ago

Creating a bash workflow for automating processing steps in the command line.

Gregory94 commented 4 years ago

Created a bash workflow for automatic processing of the satay data. The file is automatically stored in the Virtual Machine that will be uploaded to the N-drive soon with the name 'processing_workflow.sh' or can be downloaded from my repo. It does need to be tested though.