WGLab / lncScore

A python package for the identification of lncRNA from the assembled novel transcripts
17 stars 11 forks source link

Problem with result file when multithreading #7

Open kvittingseerup opened 7 years ago

kvittingseerup commented 7 years ago

I tired running lncScore with the -p 12 option and identified a problem in the result file that does not occure when i run lncScore with -p 1. The problem seems to be that the header is included multiple times (without adding a \n). See this example:

TCONS_00006551  noncoding   0.00236957180796
TCONS_00006553  coding  0.5Transcript_id    Index   Coding_score
TCONS_00042551  noncoding   0.00733343925564

Which also fits with I find this problem 11 times.

Hope you will fix this soon as the multithreading is a very nice option to have.

/Kristoffer

zhaodoctor commented 7 years ago

Thanks for your reminder, I have fixed this error.

kiefer-ch commented 7 years ago

This error persists for me.

zhaodoctor commented 7 years ago

Did you use the latest version of lncScore? And can you provide more details about this error? I checked the code agian, and I thought that the header 'Transcript_id Index Coding_score' wouldn't occure multiple times in the final reuslt file.

kiefer-ch commented 7 years ago

Sorry, I didn't read the first post correctly. My error also occurs with multithreading only and looks different but similar on the first glance.

Here are lines 100 - 110 of the output from running your test dataset with -p 1:

ENST00000310991.7 coding 0.999386315261 ENST00000378585.5 coding 0.999999998363 ENST00000378567.7 coding 0.999999999689 ENST00000400921.6 coding 0.999999938691 ENST00000461106.6 coding 0.999999983317 ENST00000400918.7 coding 0.98522181382 ENST00000378543.2 coding 0.865831901134 ENST00000378546.8 coding 0.992747033073 ENST00000378536.4 coding 1.0 ENST00000378531.7 coding 0.999999340798 ENST00000378518.5 coding 0.825609846765

and with -p 4:

ENST00000310991.7 coding 0.999386315261 ENST00000378585.5 coding 0.999999998363 ENST00000378567.7 coding 0.999999999689 ENST00000400921.6 coding 0.999999938691 ENST00000461106.6 coding 0.999999983317 ENST00000400918.7 coding 0.98522181382 ENST00000378543.2 coding 0.865831901134 ENST00000378546.8 coding 0.992747033073 ENST00000378536.4 coding 1.0 ENST00000378531.7 coding 0.9999993407ENST00000513143.5 coding 0.987860305358 ENST00000487038.5 coding 0.983239279772

I run it on Ubuntu 17.04 with Python 2.7.13

There also seems to be a problem with the handling of multiline fasta files, took me a while to figure out why my data would throw an error.

Best

zhaodoctor commented 7 years ago

Although this error didn't occur on my computer, I thought I have found the problem by the error details provided by you. I found that 'ENST00000513143.5 coding 0.987860305358' should be the first line in the temporary result file produced by the second thread. And this line should be next to the 'ENST00000376061.8 coding 0.984397833351', which is the last line in the temporary result file produced by the first thread. So the problem is that the combination of the temporary files was performed when the first temporary files had not been completely generated. The right order is that the combination of the temporary files is performed after all the temporary files has been completely generated by each thread. Thanks very much for your attention, I will solve this problem in the next few days.