EI-CoreBioinformatics / mikado

Mikado is a lightweight Python3 pipeline whose purpose is to facilitate the identification of expressed loci from RNA-Seq data * and to select the best models in each locus.
https://mikado.readthedocs.io/en/stable/
GNU Lesser General Public License v3.0
98 stars 18 forks source link

an error at the moment of merging: #241

Closed Xiaofei-git closed 5 years ago

Xiaofei-git commented 5 years ago

I am running on my real data, but I got an error from "mikado pick" step, which mentioned in this issue. I see you mentioned that it looks like a proper bug? How should I figure out? I did not meet this running on test data.

Here is my error message: ...

2019-10-16 16:34:38,323 - listener - picker.py:743 - INFO - submit_multi_threading - MainProcess - Joined children processes; starting to merge partial files 2019-10-16 16:34:38,680 - main - init.py:120 - ERROR - main - MainProcess - Mikado crashed, cause: 2019-10-16 16:34:38,680 - main - init__.py:121 - ERROR - main - MainProcess - (2, 7061, 7062, 7061, {4238}, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,...

....

File "...../Mikado/picking/_merge_lociutils.py", line 134, in _create_gene_counters chrom_tots[chrom])) AssertionError: (2, .....)

Dear @lucventurini , I started two approaches to overcome this issue and test whether the problem is a complex locus that stalls mikado pick. I have launched two different jobs, one with the entire dataset and 27 allocated cores. For the other one, I split the original dataset on scaffolds (30 jobs) and launched mikado pick with 3 cores for each of these subsets, which appeared to work well before, in one of the tests we did yesterday. The second approach is running, and managed to process the entire Scaffold 1 quite quickly, but gave an error at the moment of merging:

===== LOG ====

2019-10-10 10:41:58,907 - main_logger - picker.py:350 - INFO - setup_logger - MainProcess - Begun analysis of mikado_prepared_Scaffold1.gtf
2019-10-10 10:41:58,907 - main_logger - picker.py:352 - INFO - setup_logger - MainProcess - Command line: /data/home/btx333/.conda/envs/mikado2.1/bin/mikado pick --procs 3 --json-conf configuration.yaml --output-dir Scaffold1 mikado_prepared_Scaffold1.gtf
2019-10-10 10:41:58,908 - listener - picker.py:369 - WARNING - setup_logger - MainProcess - Current level for queue: INFO
2019-10-10 10:41:59,044 - listener - picker.py:105 - INFO - __init__ - MainProcess - Random seed: 2834569384
2019-10-10 10:41:59,442 - listener - picker.py:872 - INFO - __parse_multithreaded - MainProcess - Starting chromosome Scaffold01
2019-10-10 10:42:09,653 - listener - picker.py:913 - INFO - __parse_multithreaded - MainProcess - Finished chromosome Scaffold01
2019-10-10 10:42:10,956 - Scaffold01:21107-135422 - locus.py:1286 - INFO - expand_transcript - LociProcesser-3 - gmap1_TRINITY_DN103619_c1_g2_i21.path1 has now start 93847, end 132507.
2019-10-10 10:42:10,959 - Scaffold01:21107-135422 - locus.py:195 - WARNING - finalize_alternative_splicing - LociProcesser-3 - gmap1_TRINITY_DN103619_c1_g2_i21.path1 is now padded
2019-10-10 10:45:00,606 - listener - picker.py:743 - INFO - __submit_multi_threading - MainProcess - Joined children processes; starting to merge partial files
2019-10-10 10:45:00,613 - main - __init__.py:120 - ERROR - main - MainProcess - Mikado crashed, cause:
2019-10-10 10:45:00,613 - main - __init__.py:121 - ERROR - main - MainProcess - {10}
Traceback (most recent call last):
  File "/data/home/btx333/.conda/envs/mikado2.1/lib/python3.7/site-packages/Mikado/__init__.py", line 106, in main
    args.func(args)
  File "/data/home/btx333/.conda/envs/mikado2.1/lib/python3.7/site-packages/Mikado/subprograms/pick.py", line 199, in pick
    creator()
  File "/data/home/btx333/.conda/envs/mikado2.1/lib/python3.7/site-packages/Mikado/picking/picker.py", line 1078, in __call__
    self._parse_and_submit_input()
  File "/data/home/btx333/.conda/envs/mikado2.1/lib/python3.7/site-packages/Mikado/picking/picker.py", line 1044, in _parse_and_submit_input
    self.__submit_multi_threading()
  File "/data/home/btx333/.conda/envs/mikado2.1/lib/python3.7/site-packages/Mikado/picking/picker.py", line 750, in __submit_multi_threading
    tempdir=tempdir)
  File "/data/home/btx333/.conda/envs/mikado2.1/lib/python3.7/site-packages/Mikado/picking/loci_processer.py", line 79, in merge_loci
    set.difference(set(range(1, max(common_index.keys()) + 1)), set(common_index.keys()))
AssertionError: {10}

Originally posted by @ChemaMD in https://github.com/EI-CoreBioinformatics/mikado/issues/229#issuecomment-540495512

Xiaofei-git commented 5 years ago

Thanks a lot for quick response! Looking forward to solution, thank you again.

lucventurini commented 5 years ago

No problem @Xiaofei-git. The bug has a non-obvious origin - I have been looking at the code and so far I am struggling to see where such an error could slip through (the SQLite databases should crash immediately when trying to insert a double key). I will dig more into the code and, hopefully, I will be able to localise and remove the cause of the bug soon.

lucventurini commented 5 years ago

PS: for the record, this is a different error from what @ChemaMD experienced. In his case, the problem was that a locus had crashed and had not been analysed properly. In this case, what happened was the mirror image .. a locus was analysed twice.

I am really not clear about how that would have happened (the code is pretty tight in this sense) but I am investigating. A key problem is that I am not sure about how to reproduce the bug at all.

Xiaofei-git commented 5 years ago

Thanks a lot for the effort! But, I am wondering if I can use an older version to avoid this error?

lucventurini commented 5 years ago

Release 2.0rc4 should not have the issue. However, in 2.0rc5 I did solve a pretty big problem with pick (#217) so I would not advise going back to it.

I will try to fix this before the weekend. If it proves more recalcitrant, we can try to create a patched 2.0rc4 that solves the biggest issues present there.

lucventurini commented 5 years ago

Dear @Xiaofei-git , I have found the origin of the bug. Conveniently, it's in a function that can be tested in isolation. It should not too difficult to fix, but it might require me a little bit of time. If you could have patience until next week, I should be able to provide a fixed version. Many thanks for reporting!

lucventurini commented 5 years ago

Dear @Xiaofei-git, I think I solved the bug in a8ad26a, by rewriting large parts of the errorring function.

I would be grateful if you could try again using this commit.

Xiaofei-git commented 5 years ago

Thank you so much, you are always so kindly and efficient.

I wonder how could I run with the new commit, I need "git clone" again and install mikado again, right?

lucventurini commented 5 years ago

Inside the mikado folder, do

git pull
git checkout issue-241
git pull

Then reinstall mikado. This will pull the code from the branch I created to solve the issue

Xiaofei-git commented 5 years ago

Sorry I have't test yet, I will let you know. Sorry about that.

Xiaofei-git commented 5 years ago

How long it will take to run "pick"? What are the factors affacting on the speed? I run it on my data, but it still running, the current cpu is cpu=00:58:05 with "--procs 10"

lucventurini commented 5 years ago

Hi @Xiaofei-git, it really depends on the amount of transcripts that are present. I can tell you that it takes about 10 minutes for ~200,000 transcripts on A. thaliana, and about 2 hours for 3 million, using 32 processors.

lucventurini commented 5 years ago

Ps the version you are running should give you updates on how far it is from finishing all loci on the log file.

Xiaofei-git commented 5 years ago

I think it is suspended because some failure. Here is the tail of my standard error file, and I greped error from it.

$ tail atlas_mikado_subSmp_pick.NAM.e62604956 2019-10-21 16:16:04,438 - listener - picker.py:781 - INFO - submit_multi_threading - MainProcess - Finished with chromosome super_96 2019-10-21 16:16:04,474 - super_99:29-4233 - superlocus.py:518 - INFO - load_transcript_data - LociProcesser-7 - tr_TRINITY_GG_1899_c0_g1_i1.mrna1 has been split into 2 different transcripts. 2019-10-21 16:16:04,479 - super_99:29-4233 - superlocus.py:518 - INFO - load_transcript_data - LociProcesser-7 - cuff_CUFF.28629.1 has been split into 2 different transcripts. 2019-10-21 16:16:04,483 - super_99:29-4233 - superlocus.py:518 - INFO - load_transcript_data - LociProcesser-7 - cuff_CUFF.28629.2 has been split into 2 different transcripts. 2019-10-21 16:16:04,931 - super_99:47694-52415 - superlocus.py:518 - INFO - load_transcript_data - LociProcesser-7 - st_transcript:OQU75655 has been split into 2 different transcripts. 2019-10-21 16:16:04,938 - super_99:47694-52415 - superlocus.py:518 - INFO - load_transcript_data - LociProcesser-7 - cl_super_99.90588.3 has been split into 2 different transcripts. 2019-10-21 16:16:04,946 - super_99:47694-52415 - superlocus.py:518 - INFO - load_transcript_data - LociProcesser-7 - tr_TRINITY_GG_28117_c0_g1_i2.mrna1 has been split into 3 different transcripts. 2019-10-21 16:16:04,952 - super_99:47694-52415 - superlocus.py:518 - INFO - load_transcript_data - LociProcesser-7 - tr_TRINITY_GG_28117_c0_g1_i3.mrna1 has been split into 3 different transcripts. 2019-10-21 16:16:04,958 - super_99:47694-52415 - superlocus.py:518 - INFO - load_transcript_data - LociProcesser-7 - tr_TRINITY_GG_28117_c0_g1_i1.mrna1 has been split into 3 different transcripts. 2019-10-21 16:16:05,076 - listener - picker.py:781 - INFO - submit_multi_threading - MainProcess - Finished with chromosome super_99

$ grep -i 'error' atlas_mikado_subSmp_pick.NAM.e62604956 SystemError: returned NULL without setting an error SystemError: returned NULL without setting an error SystemError: returned NULL without setting an error 2019-10-21 15:18:07,589 - 1:17588430-17594467 - abstractlocus.py:782 - ERROR - find_retained_introns - LociProcesser-7 - Trying to find retained introns in a non-stranded locus (gene:1-:17590644-17592163) is invalid. Aborting. SystemError: returned NULL without setting an error 2019-10-21 15:18:33,892 - 1:57127117-57131987 - loci_processer.py:323 - ERROR - analyse_locus - LociProcesser-7 - AssertionError 2019-10-21 15:18:33,908 - 1:57127117-57131987 - loci_processer.py:324 - ERROR - analyse_locus - LociProcesser-7 - Removing failed locus superlocus:1+:57127117-57129347 SystemError: returned NULL without setting an error SystemError: returned NULL without setting an error SystemError: returned NULL without setting an error SystemError: returned NULL without setting an error 2019-10-21 15:19:30,849 - 1:79665594-79668324 - loci_processer.py:323 - ERROR - analyse_locus - LociProcesser-7 - AssertionError 2019-10-21 15:19:30,850 - 1:79665594-79668324 - loci_processer.py:324 - ERROR - analyse_locus - LociProcesser-7 - Removing failed locus superlocus:1+:79665594-79668324 SystemError: returned NULL without setting an error 2019-10-21 15:28:56,545 - 2:62235266-62237150 - loci_processer.py:323 - ERROR - analyse_locus - LociProcesser-7 - AssertionError 2019-10-21 15:28:56,547 - 2:62235266-62237150 - loci_processer.py:324 - ERROR - analyse_locus - LociProcesser-7 - Removing failed locus superlocus:2+:62235266-62237150 2019-10-21 15:32:31,626 - 3:1339031-1345403 - loci_processer.py:323 - ERROR - analyse_locus - LociProcesser-7 - AssertionError 2019-10-21 15:32:31,626 - 3:1339031-1345403 - loci_processer.py:324 - ERROR - analyse_locus - LociProcesser-7 - Removing failed locus superlocus:3+:1339031-1345403 2019-10-21 15:37:17,909 - 3:58815028-58822692 - abstractlocus.py:782 - ERROR - find_retained_introns - LociProcesser-7 - Trying to find retained introns in a non-stranded locus (gene:3-:58815173-58818000) is invalid. Aborting. 2019-10-21 15:41:57,460 - 4:2253067-2274538 - loci_processer.py:323 - ERROR - analyse_locus - LociProcesser-7 - AssertionError 2019-10-21 15:41:57,460 - 4:2253067-2274538 - loci_processer.py:324 - ERROR - analyse_locus - LociProcesser-7 - Removing failed locus superlocus:4-:2266993-2274538 2019-10-21 15:45:17,509 - 4:50908756-50911916 - loci_processer.py:323 - ERROR - analyse_locus - LociProcesser-7 - AssertionError 2019-10-21 15:45:17,509 - 4:50908756-50911916 - loci_processer.py:324 - ERROR - analyse_locus - LociProcesser-7 - Removing failed locus superlocus:4+:50909218-50911916 2019-10-21 15:49:38,111 - 4:66507149-66516973 - abstractlocus.py:782 - ERROR - find_retained_introns - LociProcesser-7 - Trying to find retained introns in a non-stranded locus (gene:4+:66507366-66510895) is invalid. Aborting. 2019-10-21 15:55:23,711 - 6:6506582-6515713 - loci_processer.py:323 - ERROR - analyse_locus - LociProcesser-7 - AssertionError 2019-10-21 15:55:23,711 - 6:6506582-6515713 - loci_processer.py:324 - ERROR - analyse_locus - LociProcesser-7 - Removing failed locus superlocus:6+:6506582-6515713 2019-10-21 16:00:08,972 - 6:56967241-56986744 - abstractlocus.py:782 - ERROR - find_retained_introns - LociProcesser-7 - Trying to find retained introns in a non-stranded locus (gene:6+:56969839-56972152) is invalid. Aborting. 2019-10-21 16:05:18,642 - 7:63503658-63507954 - loci_processer.py:323 - ERROR - analyse_locus - LociProcesser-7 - AssertionError 2019-10-21 16:05:18,643 - 7:63503658-63507954 - loci_processer.py:324 - ERROR - analyse_locus - LociProcesser-7 - Removing failed locus superlocus:7+:63503658-63505743 2019-10-21 16:09:53,745 - 8:62352730-62370999 - loci_processer.py:323 - ERROR - analyse_locus - LociProcesser-7 - AssertionError 2019-10-21 16:09:53,745 - 8:62352730-62370999 - loci_processer.py:324 - ERROR - analyse_locus - LociProcesser-7 - Removing failed locus superlocus:8-:62352730-62357681

Xiaofei-git commented 5 years ago

Could you give me some ideas what is going on? Also, I checked the version of my installed mikado, I don't know why it is changed to v2.0rc5. Actually, yesterday, I see it is v2.0rc6 after following the steps you advised. See?

Screen Shot 2019-10-22 at 10 07 12 PM
lucventurini commented 5 years ago

Dear @Xiaofei-git, admittedly this is really bizarre. I have never seen such an error. If you could send me the whole log (lucventurini at Gmail) I can have a look more in depth.

From a quick look around the Internet, though, I think it might be something to do with python 3.7. As a check, if you could reinstall mikado in a python 3.6 environment, that might make it work.

Xiaofei-git commented 5 years ago

Sent. I agreed. I think it might be the problem with 3.7. I will try 3.6 and see what will happen and let you know . Thanks a lot!

Xiaofei-git commented 5 years ago

With the installation with python 3.6.6, it finished run successfully. Thanks a lot!

lucventurini commented 5 years ago

Fantastic! I will close the issue, and thank you for your patience!