epruesse / SINA

SINA - Reference based multiple sequence alignment
https://sina.readthedocs.io
GNU General Public License v3.0
41 stars 4 forks source link

SINA quits #81

Closed l-yampolsky closed 4 years ago

l-yampolsky commented 4 years ago

With this last output: Processing: 14% |███████████████████████▍ | 167171/1202016 [02:02:25 / 12:37:48]

-------------------- ARB-backtrace 'received signal 11': 0 libCORE.dylib 0x0000000102895561 _ZL15sigsegv_handleri + 81 1 libsystem_platform.dylib 0x00007fff8e7dceaa _sigtramp + 26 2 ??? 0x0000000000000000 0x0 + 0

Any ideas what might have gone wrong? Is there a way to restart SINA at this point other than removing the first 167171 lines from the file?

(This is under MacOS 10.11.3)

epruesse commented 4 years ago

Hard to tell from this what went wrong. I take it, it always crashes at that line?

It's always possible that it ran out of memory. I'm not really testing with cases that take multiple hours to be honest.

You can use --fasta-idx and --fasta-block to run SINA on sections of an input fasta file. The latter gives the size in bytes of each block, the former which block to work on. It will align all sequences that begin within that range of bytes. I don't know right now if it will work on fasta.gz - this feature was from before the parallelization, to allow using xargs for running many SINA instances on one file.

epruesse commented 4 years ago

I'd be curious if you can reproduce this with the first 165k lines removed. That would make it easier to debug. If you can isolate a set of sequences that trigger the bug, I can start digging.

l-yampolsky commented 4 years ago

Will do tomorrow and let you know. To remove first 165K lines - something like this? tail -n +165000 old.fasta > new.fasta ?


From: Elmar Pruesse notifications@github.com Sent: Wednesday, December 4, 2019 3:24 PM To: epruesse/SINA SINA@noreply.github.com Cc: Yampolsky, Lev YAMPOLSK@mail.etsu.edu; Author author@noreply.github.com Subject: [EXTERNAL] Re: [epruesse/SINA] SINA quits (#81)

I'd be curious if you can reproduce this with the first 165k lines removed. That would make it easier to debug. If you can isolate a set of sequences that trigger the bug, I can start digging.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/epruesse/SINA/issues/81?email_source=notifications&email_token=ACDXHNIYLWREKOMSR4NKNFDQXAGX7A5CNFSM4JVL7QP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF6L65I#issuecomment-561823605, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACDXHNLQPRD3NMYUVHCODR3QXAGX7ANCNFSM4JVL7QPQ.

The [EXTERNAL] tag in the subject line identifies emails that do NOT originate from an ETSU person or service. Please exercise caution when handling emails from external sources. Any email that is unsolicited and requires you to take immediate action, appears to be forged or is PHISHING for information can be verified by emailing the ITS Help Desk.

epruesse commented 4 years ago

Try this:

# get file size
SIZE=`stat -c "%s" my.fasta`
BLOCK_SIZE=$(($SIZE / 100))
sina --fasta-block=$BLOCK_SIZE --fasta-idx 13 {{YOUR_OPTIONS}}

This should just process the 14th percent of the file. If it does still break, try with --prealigned to output just that set of sequences into a fasta file.

If you use smaller blocks, you should be able to narrow it down further. Although I am very much afraid that it's not a single sequence.

If you can somehow get it down to a file size you can attach here, I'll have a look at it. I just need something to reproduce the bug to start debugging...

l-yampolsky commented 4 years ago

Dear Elmar,

Sorry, end of semester, dropped this for a while. My observations: SINA quits on particular lines, restarting from the same line executes normally, but if the crush line is somewhere in the middle it will crush again on the same line.

Here is a file test.fasta that causes a crush on the last sequence in the file. It looks like an absolutely normal 18S sequence. It produces the same output:

20:38:23 [SINA] Processing: 98% |████████████▊| 78/80 [00:00:32 / 00:00:00]

Processing: 99% |███████████████████████████████████████████████████████████████████████████████████████████████▉ | 79/80 [00:00:32 / 00:00:00]

-------------------- ARB-backtrace 'received signal 11':

0 libCORE.dylib 0x000000010623d561 _ZL15sigsegv_handleri + 81

1 libsystem_platform.dylib 0x00007fff8abe6eaa _sigtramp + 26

2 libARBDB.dylib 0x00000001061ae6f3 _Z9GB_callocjj + 51

3 libsina.0.dylib 0x0000000105f6df51 _ZN4sina15cseq_comparatorclERKNS_14annotatedcseqES3 + 161

4 libsina.0.dylib 0x0000000105f7d327 _ZZN4sina9famfinder4impl5matchERNSt3__16vectorINS_6search11result_itemENS2_9allocatorIS5_EEEERKNS_14annotated_cseqEENK3$8clERKS5 + 471

5 libsina.0.dylib 0x0000000105f784bb _ZN4sina9famfinder4impl5matchERNSt3__16vectorINS_6search11result_itemENS2_9allocatorIS5_EEEERKNS_14annotated_cseqE + 939

6 libsina.0.dylib 0x0000000105f75fe8 _ZN4sina9famfinder4implclENS_4trayE + 248

7 libsina.0.dylib 0x0000000105f75ea4 _ZN4sina9famfinderclERKNS_4trayE + 68

8 sina 0x0000000105ebaf72 _ZN3tbb4flow11interface108internal18function_body_leafIN4sina4trayES5_NS49famfinderEEclERKS5 + 18

9 sina 0x0000000105eba78b _ZN3tbb4flow11interface108internal14function_inputIN4sina4trayES5_NS2_22graph_policy_namespace8queueingENS_23cache_aligned_allocatorIS5_EEE22apply_body_implbypassERKS5 + 59

10 sina 0x0000000105eba744 _ZN3tbb4flow11interface108internal22apply_body_task_bypassINS2_19function_input_baseIN4sina4trayENS2_22graph_policy_namespace8queueingENS_23cache_aligned_allocatorIS6_EENS2_14function_inputIS6_S6_S8_SA_EEEES6_E7executeEv + 20

11 libtbb.dylib 0x000000010614fd05 _ZN3tbb8internal16custom_schedulerINS0_20IntelSchedulerTraitsEE18local_wait_for_allERNS4taskEPS4 + 1557

12 libtbb.dylib 0x00000001061473ac _ZN3tbb8internal5arena7processERNS0_17generic_schedulerE + 572

13 libtbb.dylib 0x0000000106146bcb _ZN3tbb8internal6market7processERN3rml3jobE + 75

14 libtbb.dylib 0x00000001061410a9 _ZN3tbb8internal3rml14private_worker3runEv + 201

15 libtbb.dylib 0x0000000106140fd9 _ZN3tbb8internal3rml14private_worker14thread_routineEPv + 9

16 libsystem_pthread.dylib 0x00007fff81e44c13 _pthread_body + 131

17 libsystem_pthread.dylib 0x00007fff81e44b90 _pthread_body + 0

18 libsystem_pthread.dylib 0x00007fff81e42375 thread_start + 13

-------------------- End of backtrace

20:38:44 [ARB I/O] Closing ARB database '"../silva/SILVA_132_SSURef_NR99_13_12_17_opt.arb"' ...

Processing: 99% |███████████████████████████████████████████████████████████████████████████████████████████████▉ | 79/80 [00:00:53 / 00:00:00]

Thank you very much for looking into this. It was not a be issue for me – I wrote a script that would restart a large file after a crush from the line right after the last normally processed line and it went through without a problem.

Once again, this is happening on a Mac under MacOS 10.11.3, with 16GB memory.

-- Lev Yampolsky

Professor Department of Biological Sciences East Tennessee State University Box 70703 Johnson City TN 37614-1710 Cell 423-676-7489 Office/lab 423-439-4359 Fax 423-439-5958

From: Elmar Pruesse notifications@github.com<mailto:notifications@github.com> Reply-To: epruesse/SINA reply@reply.github.com<mailto:reply@reply.github.com> Date: Thursday, December 5, 2019 at 2:35 PM To: epruesse/SINA SINA@noreply.github.com<mailto:SINA@noreply.github.com> Cc: LY YAMPOLSK@mail.etsu.edu<mailto:YAMPOLSK@mail.etsu.edu>, Author author@noreply.github.com<mailto:author@noreply.github.com> Subject: [EXTERNAL] Re: [epruesse/SINA] SINA quits (#81)

Try this:

get file size

SIZE=stat -c "%s" my.fasta BLOCK_SIZE=$(($SIZE / 100)) sina --fasta-block=$BLOCK_SIZE --fasta-idx 13 {{YOUR_OPTIONS}}

This should just process the 14th percent of the file. If it does still break, try with --prealigned to output just that set of sequences into a fasta file.

If you use smaller blocks, you should be able to narrow it down further. Although I am very much afraid that it's not a single sequence.

If you can somehow get it down to a file size you can attach here, I'll have a look at it. I just need something to reproduce the bug to start debugging...

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/epruesse/SINA/issues/81?email_source=notifications&email_token=ACDXHNMQC4XQN7GYUB6LFB3QXFJXJA5CNFSM4JVL7QP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGB35QA#issuecomment-562282176, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACDXHNONPMY53O5LZNMETUDQXFJXJANCNFSM4JVL7QPQ.

The [EXTERNAL] tag in the subject line identifies emails that do NOT originate from an ETSU person or service. Please exercise caution when handling emails from external sources. Any email that is unsolicited and requires you to take immediate action, appears to be forged or is PHISHING for information can be verified by emailing the ITS Help Desk.

epruesse commented 4 years ago

@l-yampolsky - Can you attach the fasta that tiggers the bug here?