SchulzLab / TEPIC

Annotation of genomic regions using transcription factor binding sites and epigenetic data
MIT License
40 stars 9 forks source link

TRAP runs forever for merged-human and -vertebrates PSEMs #35

Closed moritzschaefer closed 3 years ago

moritzschaefer commented 3 years ago

Thanks again for the great work with the TEPIC package!

I used to run TEPIC.sh with the provided merged mouse-PSEMs and everything ran smooth. However, one transcription factor of interest (TFAP4) was only available in the vertebrate and human PSEMs, so I tried to copy that entry and append it to the merged-mouse psems (I also added the corresponding length entry). Running with the new list of PSEMs did not result in an error, however, the command never finished with continuos cpu usage (100%) (last command line message: "Starting TRAP"). Without this additional entry, TRAP ran in under an hour.

Assuming that I made some kind of formating-mistake while adding the entry to the PSEM-file, I tried to run TEPIC for the whole set of merged-human-PSEMs, yet still TRAP would never finish. Same problem for the merged-vertebrates set.

Here is the command I am running.

./libs/TEPIC/Code/TEPIC.sh -g mm10.fa -a grcm38.gtf -b peaks.narrowPeak -o output/dir/ -c 2 -n 7 -p libs/TEPIC/PWMs/2.1/Merged_PSEMs/Merged_JASPAR_HOCOMOCO_KELLIS_{organism}.PSEM -m libs/TEPIC/PWMs/2.1/Merged_PSEMs/Merged_JASPAR_HOCOMOCO_KELLIS_{organism}_length.txt -r mm10.2bit -w 1000 -v 0.01

I really see no reason why the mouse PSEM list should work while the other two don't. Am I overseeing something? Do you have an idea what kind of bug this might be?

Thanks for the help

Florian411 commented 3 years ago

Mmh, my number one concern would be that there is a bug in the line ending of the PWM file which is causing the system to behave weird. I am currently not able to check,but could you send me your manipulated PWM together with the annotation file and motif length file that you are using. I will try to have a look at the end of next week.

Florian411 commented 3 years ago

In light of your other post, do you get the same behavior with any of 2.0 PWMs when you are not modifying the files?

moritzschaefer commented 3 years ago

It seems to be unrelated to the other issue: I generated my own PSEM-lengths (with the corresponding python script) and it ran well for mouse.

Then I again added TFAP4 from human to the mouse-PSEM, recomputed the lengths, and ended up with the same issue.

To check whether the TFAP4-PSEM-values are the issue, I duplicated a PSEM in the mouse-PSEMS and called it TEST. This, STILL lead to the same error. I'm really out of ideas how this is possible to happen.

Also I noticed that TRAPmulti only uses one processor core if it's not working. If it's working (i.e. it finishes after same time), then it uses the assigned number of cores with 100% CPU usage.

I'm also quickly checking if the 2.0-merged-human PSEMs work

Florian411 commented 3 years ago

Which OS and editor are you using to manipulate the files?

moritzschaefer commented 3 years ago

Emacs/Spacemacs on linux. I made sure the files are in the unix format. Also after deleting my TEST-entry, it works again.

The 2.0/human-merged.PSEM does work!

Florian411 commented 3 years ago

Okay... so somewhere in 2.1 something broke. Let me have a look asap. Thanks for patiently trying different things!

moritzschaefer commented 3 years ago

Is there any advantage when using 2.1 as compared to 2.0 (except that a small number of TFs was added)?

Also, I assume that the available files in 2.0 correspond to the 'merged' files in 2.1.

Thank you so much for your support!

Another note: When manually adding a PSEM from human to the mouse PSEM list in 2.0, I run into the same issue again..

Florian411 commented 3 years ago

There is no difference except the few added TFs.

I will have a look at the problem with the motif adding next week.

Florian411 commented 3 years ago

I have tried it on my own system and it works. My guess is that something is wrong with file endings. I close this cause i can't reproduce it.