glennhickey / progressiveCactus

Distribution package for the Prgressive Cactus multiple genome aligner. Dependencies are linked as submodules
Other
80 stars 26 forks source link

Unable to aligin using runProgressiveCactus.s #22

Closed gopin001 closed 9 years ago

gopin001 commented 9 years ago

I have been trying to align three genomes since more than a week. No success so far. Alignment runs for more than hour and the stops without creating a hal file. [ If I use a sample genome less than 10000 size I succeed] Any help greatly appreciated.

Thanks, Arun

Command used: runProgressiveCactus.sh s.txt ./weig ./weig/eig.hal --maxThreads 40

s.txt >> ehux /home/gopin001/fasta/ehux.fasta iso /home/gopin001/fasta/iso.fasta geph /home/gopin001/fasta/geph.fasta

size >> ehux.fasta = 168859287 iso.fasta = 85360404 geph.fasta = 274180427

fasta file format for ehux.fasta >> >ehux1 CGCTCGAGACGTTAGGAAGTGTCAGGAAGTGTCTA... ...... >ehux3 ..... >ehux4 ............. fasta file format for iso.fasta >> >iso1 CGCTCGAGACGTTAGGAAGTGTCAGGAAGTGTCTA... ...... >iso2 ..... >iso4 ....... fasta file format for geph.fasta >> >geph1 CGCTCGAGACGTTAGGAAGTGTCAGGAAGTGTCTA... ...... >geph2 ..... >geph3 .........

cactus.log >> 2014-08-04 22:14:40.426191: Beginning Progressive Cactus Alignment

Got message from job at time: 1407215683.71 : Before running any preprocessing on the assembly: /home/gopin001/fasta/iso.fasta got following stats (assembly may be listed as temp file if input sequ ences from a directory): Input-sample: /home/gopin001/fasta/iso.fasta Total-sequences: 1140 Total-length: 84296971 Proportion-repeat-masked: 0.219075 ProportionNs: 0.219075 Total-Ns: 18467385 N50: 419869 Median-sequence-length: 4491 Max-sequence-length: 4576395 Min-sequence-length: 971 Got message from job at time: 1407215685.29 : Before running any preprocessing on the assembly: /home/gopin001/fasta/ehux.fasta got following stats (assembly may be listed as temp file if input seq uences from a directory): Input-sample: /home/gopin001/fasta/ehux.fasta Total-sequences: 6995 Total-length: 166019287 Proportion-repeat-masked: 0.170904 ProportionNs: 0.070425 Total-Ns: 11691911 N5 0: 407932 Median-sequence-length: 1682 Max-sequence-length: 3018814 Min-sequence-length: 1000 Got message from job at time: 1407215687.96 : Before running any preprocessing on the assembly: /home/gopin001/fasta/geph.fasta got following stats (assembly may be listed as temp file if input seq uences from a directory): Input-sample: /home/gopin001/fasta/geph.fasta Total-sequences: 6753 Total-length: 269617066 Proportion-repeat-masked: 0.164811 ProportionNs: 0.129277 Total-Ns: 34855353 N5 0: 121969 Median-sequence-length: 11092 Max-sequence-length: 4985475 Min-sequence-length: 80 Got message from job at time: 1407216162.1 : After preprocessing assembly we got the following stats: Input-sample: ./weig/sequenceData/iso.fasta_0 Total-sequences: 1140 Total-length: 84296971 Prop ortion-repeat-masked: 0.296622 ProportionNs: 0.219075 Total-Ns: 18467385 N50: 419869 Median-sequence-length: 4491 Max-sequence-length: 4576395 Min-sequence-length: 971 Got message from job at time: 1407216800.95 : After preprocessing assembly we got the following stats: Input-sample: ./weig/sequenceData/ehux.fasta_2 Total-sequences: 6995 Total-length: 166019287 P roportion-repeat-masked: 0.353080 ProportionNs: 0.070425 Total-Ns: 11691911 N50: 407932 Median-sequence-length: 1682 Max-sequence-length: 3018814 Min-sequence-length: 1000 Got message from job at time: 1407216991.56 : After preprocessing assembly we got the following stats: Input-sample: ./weig/sequenceData/geph.fasta_1 Total-sequences: 6753 Total-length: 269617066 P roportion-repeat-masked: 0.274789 ProportionNs: 0.129277 Total-Ns: 34855353 N50: 121969 Median-sequence-length: 11092 Max-sequence-length: 4985475 Min-sequence-length: 80 Got message from job at time: 1407217002.05 : Blocking on ktserver <kyoto_tycoon database_dir="/home/gopin001/thesis/cactus/gitpc/progressiveCactus/weig/progressiveAlignment/Anc0/Anc0/Anc0_DB" data base_name="Anc0.kch" in_memory="1" port="1978" snapshot="0" /> with killPath /home/gopin001/thesis/cactus/gitpc/progressiveCactus/weig/jobTree/jobs/gTD2/tmp_QWvMMwAnXA/tmp_Lo9XCX0qVz_kill.txt Got message from job at time: 1407217040.39 : Starting caf phase target with index 0 at 1407217026.15 seconds (recursing = 1) Got message from job at time: 1407217040.39 : Adding an oversize flower for target class <class 'cactus.pipeline.cactus_workflow.CactusCafWrapperLarge'> and stats flower name: 0 total bases: 519933 324 total-ends: 29776 total-caps: 29776 max-end-degree: 1 max-adjacency-length: 4985476 total-blocks: 0 total-groups: 1 total-edges: 14888 total-free-ends: 29776 total-attached-ends: 0 total-chains : 0 total-link groups: 0

benedictpaten commented 9 years ago

Hi Arun,

Not sure what the bug is here, as there is not error message -- did it just stop? How much memory does the machine you were running on have? The genomes you are aligning are fairly small as these things go, so I would not expect difficulty.

Benedict

On Tue, Aug 5, 2014 at 12:51 AM, gopin001 notifications@github.com wrote:

I have been trying to align three genomes since more than a week. No success so far. Alignment runs for more than hour and the stops without creating a hal file. [ If I use a sample genome less than 10000 size I succeed] Any help greatly appreciated.

Thanks, Arun

Command used: runProgressiveCactus.sh s.txt ./weig ./weig/eig.hal --maxThreads 40

s.txt >> ehux /home/gopin001/fasta/ehux.fasta iso /home/gopin001/fasta/iso.fasta geph /home/gopin001/fasta/geph.fasta

size >> ehux.fasta = 168859287 iso.fasta = 85360404 geph.fasta = 274180427

fasta file format for ehux.fasta >>

ehux1 CGCTCGAGACGTTAGGAAGTGTCAGGAAGTGTCTA... ...... ehux3 ..... ehux4 ............. fasta file format for iso.fasta >> iso1 CGCTCGAGACGTTAGGAAGTGTCAGGAAGTGTCTA... ...... iso2 ..... iso4 ....... fasta file format for geph.fasta >> geph1 CGCTCGAGACGTTAGGAAGTGTCAGGAAGTGTCTA... ...... geph2 ..... geph3 .........

cactus.log >> 2014-08-04 22:14:40.426191: Beginning Progressive Cactus Alignment

Got message from job at time: 1407215683.71 : Before running any preprocessing on the assembly: /home/gopin001/fasta/iso.fasta got following stats (assembly may be listed as temp file if input sequ ences from a directory): Input-sample: /home/gopin001/fasta/iso.fasta Total-sequences: 1140 Total-length: 84296971 Proportion-repeat-masked: 0.219075 ProportionNs: 0.219075 Total-Ns: 18467385 N50: 419869 Median-sequence-length: 4491 Max-sequence-length: 4576395 Min-sequence-length: 971 Got message from job at time: 1407215685.29 : Before running any preprocessing on the assembly: /home/gopin001/fasta/ehux.fasta got following stats (assembly may be listed as temp file if input seq uences from a directory): Input-sample: /home/gopin001/fasta/ehux.fasta Total-sequences: 6995 Total-length: 166019287 Proportion-repeat-masked: 0.170904 ProportionNs: 0.070425 Total-Ns: 11691911 N5 0: 407932 Median-sequence-length: 1682 Max-sequence-length: 3018814 Min-sequence-length: 1000 Got message from job at time: 1407215687.96 : Before running any preprocessing on the assembly: /home/gopin001/fasta/geph.fasta got following stats (assembly may be listed as temp file if input seq uences from a directory): Input-sample: /home/gopin001/fasta/geph.fasta Total-sequences: 6753 Total-length: 269617066 Proportion-repeat-masked: 0.164811 ProportionNs: 0.129277 Total-Ns: 34855353 N5 0: 121969 Median-sequence-length: 11092 Max-sequence-length: 4985475 Min-sequence-length: 80 Got message from job at time: 1407216162.1 : After preprocessing assembly we got the following stats: Input-sample: ./weig/sequenceData/iso.fasta_0 Total-sequences: 1140 Total-length: 84296971 Prop ortion-repeat-masked: 0.296622 ProportionNs: 0.219075 Total-Ns: 18467385 N50: 419869 Median-sequence-length: 4491 Max-sequence-length: 4576395 Min-sequence-length: 971 Got message from job at time: 1407216800.95 : After preprocessing assembly we got the following stats: Input-sample: ./weig/sequenceData/ehux.fasta_2 Total-sequences: 6995 Total-length: 166019287 P roportion-repeat-masked: 0.353080 ProportionNs: 0.070425 Total-Ns: 11691911 N50: 407932 Median-sequence-length: 1682 Max-sequence-length: 3018814 Min-sequence-length: 1000 Got message from job at time: 1407216991.56 : After preprocessing assembly we got the following stats: Input-sample: ./weig/sequenceData/geph.fasta_1 Total-sequences: 6753 Total-length: 269617066 P roportion-repeat-masked: 0.274789 ProportionNs: 0.129277 Total-Ns: 34855353 N50: 121969 Median-sequence-length: 11092 Max-sequence-length: 4985475 Min-sequence-length: 80 Got message from job at time: 1407217002.05 : Blocking on ktserver base_name="Anc0.kch" in_memory="1" port="1978" snapshot="0" /> with killPath /home/gopin001/thesis/cactus/gitpc/progressiveCactus/weig/jobTree/jobs/gTD2/tmp_QWvMMwAnXA/tmp_Lo9XCX0qVz_kill.txt Got message from job at time: 1407217040.39 : Starting caf phase target with index 0 at 1407217026.15 seconds (recursing = 1) Got message from job at time: 1407217040.39 : Adding an oversize flower for target class and stats flower name: 0 total bases: 519933 324 total-ends: 29776 total-caps: 29776 max-end-degree: 1 max-adjacency-length: 4985476 total-blocks: 0 total-groups: 1 total-edges: 14888 total-free-ends: 29776 total-attached-ends: 0 total-chains : 0 total-link groups: 0

— Reply to this email directly or view it on GitHub https://github.com/glennhickey/progressiveCactus/issues/22.

joelarmstrong commented 9 years ago

Thanks for the detailed bug report, the log is helpful.

I'm in the process of trying this with genomes of the same size (not the same sequence, though) to see if there is an issue. Depending on genome size, we choose between running the blast inside caf vs outside of caf. I was a little worried something might be wrong with that code, but it just started the caf phase and it looks fine so far.

Are there any "ktserver" processes left running after the alignment stops? Can you report the contents of ./weig/progressiveAlignment/Anc0/Anc0/Anc0_DB/ktout.log?

gopin001 commented 9 years ago

Hello Joel,

Thank you for your email. I feel more confident to use progressive cactus knowing that there is good support.

I figured out the issue and I was just about to update on the site with my findings. The issue was my limited experience on Linux. I am using putty to access the shell. The server closed the connection if it did not hear anything from the client for a while. And when this happened the progressive cactus program which was running in the foreground terminated without any errors and when I tried to redo the alignment, I had to overwrite the work folder as the previous session ended abruptly(perhaps).

I had to disconnect the job from its ssh session.

I used the command setsid runProgressiveCactus.sh s.txt ./weig ./weig/eig.hal --maxThreads 40

sout

Now I have the hal output file.

On to the next step now.

Thanks, Arun

benedictpaten commented 9 years ago

Yes, I'd heartily recommend running your cactus processes within a screen session, that way you won't lose your progress due to disconnection.

http://www.gnu.org/software/screen/

On Wed, Aug 6, 2014 at 3:12 PM, gopin001 notifications@github.com wrote:

Closed #22 https://github.com/glennhickey/progressiveCactus/issues/22.

— Reply to this email directly or view it on GitHub https://github.com/glennhickey/progressiveCactus/issues/22#event-149997023 .