Closed genaev closed 9 years ago
Dear Misha, I see that you have a problem with nucmer. To be able to help you I should see at:
Hope to be able to help, Emanuele Bosi
On Tue, Jun 23, 2015 at 6:30 AM, Misha Genaev notifications@github.com wrote:
I launched the program and after several days of work got the error: 4: FINISHING DATA ERROR: The reference file may contain sequences with non-unique header Ids, please check your input files and try again ERROR: postnuc returned non-zero done.
Building the network...
I check that reference not contain non-unique header: how should I fix this?
[mag@smp medusa]$ cat reference_genomes/Canis_familiaris.fa | grep '>'
10 dna_rm:chromosome chromosome:CanFam3.1:10:1:69331447:1 REF 11 dna_rm:chromosome chromosome:CanFam3.1:11:1:74389097:1 REF 12 dna_rm:chromosome chromosome:CanFam3.1:12:1:72498081:1 REF 13 dna_rm:chromosome chromosome:CanFam3.1:13:1:63241923:1 REF 14 dna_rm:chromosome chromosome:CanFam3.1:14:1:60966679:1 REF 15 dna_rm:chromosome chromosome:CanFam3.1:15:1:64190966:1 REF 16 dna_rm:chromosome chromosome:CanFam3.1:16:1:59632846:1 REF 17 dna_rm:chromosome chromosome:CanFam3.1:17:1:64289059:1 REF 18 dna_rm:chromosome chromosome:CanFam3.1:18:1:55844845:1 REF 19 dna_rm:chromosome chromosome:CanFam3.1:19:1:53741614:1 REF 1 dna_rm:chromosome chromosome:CanFam3.1:1:1:122678785:1 REF 20 dna_rm:chromosome chromosome:CanFam3.1:20:1:58134056:1 REF 21 dna_rm:chromosome chromosome:CanFam3.1:21:1:50858623:1 REF 22 dna_rm:chromosome chromosome:CanFam3.1:22:1:61439934:1 REF 23 dna_rm:chromosome chromosome:CanFam3.1:23:1:52294480:1 REF 24 dna_rm:chromosome chromosome:CanFam3.1:24:1:47698779:1 REF 25 dna_rm:chromosome chromosome:CanFam3.1:25:1:51628933:1 REF 26 dna_rm:chromosome chromosome:CanFam3.1:26:1:38964690:1 REF 27 dna_rm:chromosome chromosome:CanFam3.1:27:1:45876710:1 REF 28 dna_rm:chromosome chromosome:CanFam3.1:28:1:41182112:1 REF 29 dna_rm:chromosome chromosome:CanFam3.1:29:1:41845238:1 REF 2 dna_rm:chromosome chromosome:CanFam3.1:2:1:85426708:1 REF 30 dna_rm:chromosome chromosome:CanFam3.1:30:1:40214260:1 REF 31 dna_rm:chromosome chromosome:CanFam3.1:31:1:39895921:1 REF 32 dna_rm:chromosome chromosome:CanFam3.1:32:1:38810281:1 REF 33 dna_rm:chromosome chromosome:CanFam3.1:33:1:31377067:1 REF 34 dna_rm:chromosome chromosome:CanFam3.1:34:1:42124431:1 REF 35 dna_rm:chromosome chromosome:CanFam3.1:35:1:26524999:1 REF 36 dna_rm:chromosome chromosome:CanFam3.1:36:1:30810995:1 REF 37 dna_rm:chromosome chromosome:CanFam3.1:37:1:30902991:1 REF 38 dna_rm:chromosome chromosome:CanFam3.1:38:1:23914537:1 REF 3 dna_rm:chromosome chromosome:CanFam3.1:3:1:91889043:1 REF 4 dna_rm:chromosome chromosome:CanFam3.1:4:1:88276631:1 REF 5 dna_rm:chromosome chromosome:CanFam3.1:5:1:88915250:1 REF 6 dna_rm:chromosome chromosome:CanFam3.1:6:1:77573801:1 REF 7 dna_rm:chromosome chromosome:CanFam3.1:7:1:80974532:1 REF 8 dna_rm:chromosome chromosome:CanFam3.1:8:1:74330416:1 REF 9 dna_rm:chromosome chromosome:CanFam3.1:9:1:61074082:1 REF X dna_rm:chromosome chromosome:CanFam3.1:X:1:123869142:1 REF
— Reply to this email directly or view it on GitHub https://github.com/combogenomics/medusa/issues/4.
the comman line:
$ java -jar ~/prog/medusa/medusa.jar -f reference_genomes/ -i L_RNA_scaffolder_20150415-aligngraph-lastz-blat.fasta -o L_RNA_scaffolder_20150415-aligngraph-lastz-blat-medusa.fasta -v 1>medusa.out 2>medusa.err
the list of reference genomes (only one):
$ ls -l reference_genomes/
итого 2311230
-rw-rw-r-- 1 mag mag 2366430334 Июн 23 11:28 Canis_familiaris_rename.fa
the first ten lines of query genome and reference genomes:
$ head -10 L_RNA_scaffolder_20150415-aligngraph-lastz-blat.fasta
>AlignGraph0 @ chr1 : Contig56154 ; Contig112286 ;
CCTGGGCTGGCCTGGCCCCGGGCTCTGTGGGGGCAGGGGCGACTCGAGCGCCTGCCCCAG
GCCCAGCGCTAATGGGATCCAGCGCCCCGGGGCCCTGGCCAGGCCTGCGTTCTGGCCCCG
GGCTGAGCGAGCCCTCTCACAGCAGGATGTGGCCTCCTGGGCCCTGGGCGTGCCCCACAG
ACCCAGGTCGCTGACCTAGGGCAGCCCTGGGGATCCCAGGACACCCCAGGGCAGGGCAGC
GGGGGCAGGCCCGTGGGCCCCACCCACACCCACACCCACACCCACACCCACACCCACACC
AGGAGGCAGGCTCCTCCACGTGCTGGGGCTCCAGGGGGCCCGAGGCCGGGGTCCAGGTGC
CCTGCGGGGACCCAGCGCAGGAGCCCATGGGCCGCAGGTCCACGTCAGCCCTGCCGCCCC
CAGAACCGCCTCTCACCGCCCCGACCTCAGGACACGCGCCCCCACCCGGAGTCCACACAG
GCTCCCAGGTGGGGGACTCCAGCACCCGCCTCCCCACCCCGTGGGATTCTTGCCCTCCCC
$ head -10 reference_genomes/Canis_familiaris_rename.fa
>1_10dna_rm:chromosomechromosome:CanFam3.1:10:1:69331447:1REF
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
Yesterday I wrote a small script to rename reference genome and restarted medusa.
#!/usr/bin/perl
my $i = 0;
open F, "$ARGV[0]" || die $!;
while (<F>) {
if (/^>/) {
$i++;
chomp;
s/^>//;
s/\s//g;
print ">${i}_$_\n";
warn ">${i}_$_\n"
}
else {print}
}
Now headings:
$ cat reference_genomes/Canis_familiaris_rename.fa | grep '>'
>1_10dna_rm:chromosomechromosome:CanFam3.1:10:1:69331447:1REF
>2_11dna_rm:chromosomechromosome:CanFam3.1:11:1:74389097:1REF
>3_12dna_rm:chromosomechromosome:CanFam3.1:12:1:72498081:1REF
>4_13dna_rm:chromosomechromosome:CanFam3.1:13:1:63241923:1REF
>5_14dna_rm:chromosomechromosome:CanFam3.1:14:1:60966679:1REF
>6_15dna_rm:chromosomechromosome:CanFam3.1:15:1:64190966:1REF
>7_16dna_rm:chromosomechromosome:CanFam3.1:16:1:59632846:1REF
>8_17dna_rm:chromosomechromosome:CanFam3.1:17:1:64289059:1REF
>9_18dna_rm:chromosomechromosome:CanFam3.1:18:1:55844845:1REF
>10_19dna_rm:chromosomechromosome:CanFam3.1:19:1:53741614:1REF
>11_1dna_rm:chromosomechromosome:CanFam3.1:1:1:122678785:1REF
>12_20dna_rm:chromosomechromosome:CanFam3.1:20:1:58134056:1REF
>13_21dna_rm:chromosomechromosome:CanFam3.1:21:1:50858623:1REF
>14_22dna_rm:chromosomechromosome:CanFam3.1:22:1:61439934:1REF
>15_23dna_rm:chromosomechromosome:CanFam3.1:23:1:52294480:1REF
>16_24dna_rm:chromosomechromosome:CanFam3.1:24:1:47698779:1REF
>17_25dna_rm:chromosomechromosome:CanFam3.1:25:1:51628933:1REF
>18_26dna_rm:chromosomechromosome:CanFam3.1:26:1:38964690:1REF
>19_27dna_rm:chromosomechromosome:CanFam3.1:27:1:45876710:1REF
>20_28dna_rm:chromosomechromosome:CanFam3.1:28:1:41182112:1REF
>21_29dna_rm:chromosomechromosome:CanFam3.1:29:1:41845238:1REF
>22_2dna_rm:chromosomechromosome:CanFam3.1:2:1:85426708:1REF
>23_30dna_rm:chromosomechromosome:CanFam3.1:30:1:40214260:1REF
>24_31dna_rm:chromosomechromosome:CanFam3.1:31:1:39895921:1REF
>25_32dna_rm:chromosomechromosome:CanFam3.1:32:1:38810281:1REF
>26_33dna_rm:chromosomechromosome:CanFam3.1:33:1:31377067:1REF
>27_34dna_rm:chromosomechromosome:CanFam3.1:34:1:42124431:1REF
>28_35dna_rm:chromosomechromosome:CanFam3.1:35:1:26524999:1REF
>29_36dna_rm:chromosomechromosome:CanFam3.1:36:1:30810995:1REF
>30_37dna_rm:chromosomechromosome:CanFam3.1:37:1:30902991:1REF
>31_38dna_rm:chromosomechromosome:CanFam3.1:38:1:23914537:1REF
>32_3dna_rm:chromosomechromosome:CanFam3.1:3:1:91889043:1REF
>33_4dna_rm:chromosomechromosome:CanFam3.1:4:1:88276631:1REF
>34_5dna_rm:chromosomechromosome:CanFam3.1:5:1:88915250:1REF
>35_6dna_rm:chromosomechromosome:CanFam3.1:6:1:77573801:1REF
>36_7dna_rm:chromosomechromosome:CanFam3.1:7:1:80974532:1REF
>37_8dna_rm:chromosomechromosome:CanFam3.1:8:1:74330416:1REF
>38_9dna_rm:chromosomechromosome:CanFam3.1:9:1:61074082:1REF
>39_Xdna_rm:chromosomechromosome:CanFam3.1:X:1:123869142:1REF
Dear Misha, I think that when parsing a fasta file mummer splices the fasta header until the first space. Hence the non unique header problem (it should see all sequences with the AlignGraph0 header). Try renaming your headers and run medusa with a shorter version of your files to see if it is working. If you need help to9 do this i can provide you some scripts. Best regards Emanuele Il 24/giu/2015 09:48 "Misha Genaev" notifications@github.com ha scritto:
the comman line:
$ java -jar ~/prog/medusa/medusa.jar -f reference_genomes/ -i L_RNA_scaffolder_20150415-aligngraph-lastz-blat.fasta -o L_RNA_scaffolder_20150415-aligngraph-lastz-blat-medusa.fasta -v 1>medusa.out 2>medusa.err
the list of reference genomes (only one):
$ ls -l reference_genomes/ итого 2311230 -rw-rw-r-- 1 mag mag 2366430334 Июн 23 11:28 Canis_familiaris_rename.fa
the first ten lines of query genome and reference genomes:
$ head -10 L_RNA_scaffolder_20150415-aligngraph-lastz-blat.fasta
AlignGraph0 @ chr1 : Contig56154 ; Contig112286 ; CCTGGGCTGGCCTGGCCCCGGGCTCTGTGGGGGCAGGGGCGACTCGAGCGCCTGCCCCAG GCCCAGCGCTAATGGGATCCAGCGCCCCGGGGCCCTGGCCAGGCCTGCGTTCTGGCCCCG GGCTGAGCGAGCCCTCTCACAGCAGGATGTGGCCTCCTGGGCCCTGGGCGTGCCCCACAG ACCCAGGTCGCTGACCTAGGGCAGCCCTGGGGATCCCAGGACACCCCAGGGCAGGGCAGC GGGGGCAGGCCCGTGGGCCCCACCCACACCCACACCCACACCCACACCCACACCCACACC AGGAGGCAGGCTCCTCCACGTGCTGGGGCTCCAGGGGGCCCGAGGCCGGGGTCCAGGTGC CCTGCGGGGACCCAGCGCAGGAGCCCATGGGCCGCAGGTCCACGTCAGCCCTGCCGCCCC CAGAACCGCCTCTCACCGCCCCGACCTCAGGACACGCGCCCCCACCCGGAGTCCACACAG GCTCCCAGGTGGGGGACTCCAGCACCCGCCTCCCCACCCCGTGGGATTCTTGCCCTCCCC
$ head -10 reference_genomes/Canis_familiaris_rename.fa
1_10dna_rm:chromosomechromosome:CanFam3.1:10:1:69331447:1REF NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
— Reply to this email directly or view it on GitHub https://github.com/combogenomics/medusa/issues/4#issuecomment-114769196.
Ok i was not fast enough, it seems the best thing to do here, i am thinking of renaming input as a preprocessing step, let's see if this works for you, let me know Il 24/giu/2015 09:55 "Emanuele Bosi" bosiemanuele@gmail.com ha scritto:
Dear Misha, I think that when parsing a fasta file mummer splices the fasta header until the first space. Hence the non unique header problem (it should see all sequences with the AlignGraph0 header). Try renaming your headers and run medusa with a shorter version of your files to see if it is working. If you need help to9 do this i can provide you some scripts. Best regards Emanuele Il 24/giu/2015 09:48 "Misha Genaev" notifications@github.com ha scritto:
the comman line:
$ java -jar ~/prog/medusa/medusa.jar -f reference_genomes/ -i L_RNA_scaffolder_20150415-aligngraph-lastz-blat.fasta -o L_RNA_scaffolder_20150415-aligngraph-lastz-blat-medusa.fasta -v 1>medusa.out 2>medusa.err
the list of reference genomes (only one):
$ ls -l reference_genomes/ итого 2311230 -rw-rw-r-- 1 mag mag 2366430334 Июн 23 11:28 Canis_familiaris_rename.fa
the first ten lines of query genome and reference genomes:
$ head -10 L_RNA_scaffolder_20150415-aligngraph-lastz-blat.fasta
AlignGraph0 @ chr1 : Contig56154 ; Contig112286 ; CCTGGGCTGGCCTGGCCCCGGGCTCTGTGGGGGCAGGGGCGACTCGAGCGCCTGCCCCAG GCCCAGCGCTAATGGGATCCAGCGCCCCGGGGCCCTGGCCAGGCCTGCGTTCTGGCCCCG GGCTGAGCGAGCCCTCTCACAGCAGGATGTGGCCTCCTGGGCCCTGGGCGTGCCCCACAG ACCCAGGTCGCTGACCTAGGGCAGCCCTGGGGATCCCAGGACACCCCAGGGCAGGGCAGC GGGGGCAGGCCCGTGGGCCCCACCCACACCCACACCCACACCCACACCCACACCCACACC AGGAGGCAGGCTCCTCCACGTGCTGGGGCTCCAGGGGGCCCGAGGCCGGGGTCCAGGTGC CCTGCGGGGACCCAGCGCAGGAGCCCATGGGCCGCAGGTCCACGTCAGCCCTGCCGCCCC CAGAACCGCCTCTCACCGCCCCGACCTCAGGACACGCGCCCCCACCCGGAGTCCACACAG GCTCCCAGGTGGGGGACTCCAGCACCCGCCTCCCCACCCCGTGGGATTCTTGCCCTCCCC
$ head -10 reference_genomes/Canis_familiaris_rename.fa
1_10dna_rm:chromosomechromosome:CanFam3.1:10:1:69331447:1REF NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
— Reply to this email directly or view it on GitHub https://github.com/combogenomics/medusa/issues/4#issuecomment-114769196 .
Dear Misha, please let me know if you could solve the issue so I can push a new medusa version. Thank you! Emanuele Bosi
On Wed, Jun 24, 2015 at 9:57 AM, Emanuele Bosi bosiemanuele@gmail.com wrote:
Ok i was not fast enough, it seems the best thing to do here, i am thinking of renaming input as a preprocessing step, let's see if this works for you, let me know Il 24/giu/2015 09:55 "Emanuele Bosi" bosiemanuele@gmail.com ha scritto:
Dear Misha, I think that when parsing a fasta file mummer splices the fasta header until the first space. Hence the non unique header problem (it should see all sequences with the AlignGraph0 header). Try renaming your headers and run medusa with a shorter version of your files to see if it is working. If you need help to9 do this i can provide you some scripts. Best regards Emanuele Il 24/giu/2015 09:48 "Misha Genaev" notifications@github.com ha scritto:
the comman line:
$ java -jar ~/prog/medusa/medusa.jar -f reference_genomes/ -i L_RNA_scaffolder_20150415-aligngraph-lastz-blat.fasta -o L_RNA_scaffolder_20150415-aligngraph-lastz-blat-medusa.fasta -v 1>medusa.out 2>medusa.err
the list of reference genomes (only one):
$ ls -l reference_genomes/ итого 2311230 -rw-rw-r-- 1 mag mag 2366430334 Июн 23 11:28 Canis_familiaris_rename.fa
the first ten lines of query genome and reference genomes:
$ head -10 L_RNA_scaffolder_20150415-aligngraph-lastz-blat.fasta
AlignGraph0 @ chr1 : Contig56154 ; Contig112286 ; CCTGGGCTGGCCTGGCCCCGGGCTCTGTGGGGGCAGGGGCGACTCGAGCGCCTGCCCCAG GCCCAGCGCTAATGGGATCCAGCGCCCCGGGGCCCTGGCCAGGCCTGCGTTCTGGCCCCG GGCTGAGCGAGCCCTCTCACAGCAGGATGTGGCCTCCTGGGCCCTGGGCGTGCCCCACAG ACCCAGGTCGCTGACCTAGGGCAGCCCTGGGGATCCCAGGACACCCCAGGGCAGGGCAGC GGGGGCAGGCCCGTGGGCCCCACCCACACCCACACCCACACCCACACCCACACCCACACC AGGAGGCAGGCTCCTCCACGTGCTGGGGCTCCAGGGGGCCCGAGGCCGGGGTCCAGGTGC CCTGCGGGGACCCAGCGCAGGAGCCCATGGGCCGCAGGTCCACGTCAGCCCTGCCGCCCC CAGAACCGCCTCTCACCGCCCCGACCTCAGGACACGCGCCCCCACCCGGAGTCCACACAG GCTCCCAGGTGGGGGACTCCAGCACCCGCCTCCCCACCCCGTGGGATTCTTGCCCTCCCC
$ head -10 reference_genomes/Canis_familiaris_rename.fa
1_10dna_rm:chromosomechromosome:CanFam3.1:10:1:69331447:1REF NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
— Reply to this email directly or view it on GitHub https://github.com/combogenomics/medusa/issues/4#issuecomment-114769196 .
I rename my input fasta files by perl script:
#!/usr/bin/perl
my $i = 0;
open F, "$ARGV[0]" || die $!;
while (<F>) {
if (/^>/) {
$i++;
chomp;
s/^>//;
s/\s//g;
print ">${i} $_\n";
}
else {print}
}
after that alignment step completed without errors.
Now in progress log I see:
INPUT FILE:L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename.fasta
------------------------------------------------------------------------------------------------------------------------
Running MUMmer...1: PREPARING DATA
2,3: RUNNING mummer AND CREATING CLUSTERS
# reading input file "L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename_Ailuropoda_melanoleuca.ntref" of length 2982393815
# construct suffix tree for sequence of length 2982393815
# (maximum reference length is 2305843009213693948)
# (maximum query length is 18446744073709551615)
# process 29823938 characters per dot
#....................................................................................................
# CONSTRUCTIONTIME /home/mag/prog/MUMmer3.23/mummer L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename_Ailuropoda_melanoleuca.ntref 4193.38
# reading input file "/datapool/mag/medusa/reference_genomes/Ailuropoda_melanoleuca.ailMel1.dna.toplevel_rename.fa" of length 2299590481
# matching query-file "/datapool/mag/medusa/reference_genomes/Ailuropoda_melanoleuca.ailMel1.dna.toplevel_rename.fa"
# against subject-file "L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename_Ailuropoda_melanoleuca.ntref"
# COMPLETETIME /home/mag/prog/MUMmer3.23/mummer L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename_Ailuropoda_melanoleuca.ntref 13025.02
# SPACE /home/mag/prog/MUMmer3.23/mummer L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename_Ailuropoda_melanoleuca.ntref 5126.50
4: FINISHING DATA
1: PREPARING DATA
2,3: RUNNING mummer AND CREATING CLUSTERS
# reading input file "L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename_Canis_familiaris_rename.ntref" of length 2982393815
# construct suffix tree for sequence of length 2982393815
# (maximum reference length is 2305843009213693948)
# (maximum query length is 18446744073709551615)
# process 29823938 characters per dot
#....................................................................................................
# CONSTRUCTIONTIME /home/mag/prog/MUMmer3.23/mummer L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename_Canis_familiaris_rename.ntref 4915.44
# reading input file "/datapool/mag/medusa/reference_genomes/Canis_familiaris_rename.fa" of length 2327634022
# matching query-file "/datapool/mag/medusa/reference_genomes/Canis_familiaris_rename.fa"
# against subject-file "L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename_Canis_familiaris_rename.ntref"
# COMPLETETIME /home/mag/prog/MUMmer3.23/mummer L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename_Canis_familiaris_rename.ntref 10456.33
# SPACE /home/mag/prog/MUMmer3.23/mummer L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename_Canis_familiaris_rename.ntref 5148.45
4: FINISHING DATA
done.
------------------------------------------------------------------------------------------------------------------------
Building the network...
and nothing has changed for several days.
in the directory where the program is running, I see the following files:
$ ls -l -h
total 6.4G
-rw-rw-r-- 1 mag mag 2.9G Jun 24 20:48 L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename.fasta
-rw-rw-r-- 1 mag mag 122M Jun 25 07:55 L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename_Ailuropoda_melanoleuca.coords
-rw-rw-r-- 1 mag mag 171M Jun 25 07:54 L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename_Ailuropoda_melanoleuca.delta
-rw-rw-r-- 1 mag mag 259M Jun 29 04:40 L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename_Canis_familiaris_rename.coords
-rw-rw-r-- 1 mag mag 135M Jun 29 04:40 L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename_Canis_familiaris_rename.delta
-rw-rw-r-- 1 mag mag 2.9G Jun 23 12:11 L_RNA_scaffolder_20150415-aligngraph-lastz-blat.fasta
-rw-rw-r-- 1 mag mag 0 Jun 24 21:06 medusa.err
-rw-rw-r-- 1 mag mag 2.9K Jun 29 04:40 medusa.out
lrwxrwxrwx 1 mag mag 37 Jun 19 18:48 medusa_scripts -> /home/mag/prog/medusa/medusa_scripts/
drwxrwxr-x 2 mag mag 4 Jun 24 21:04 reference_genomes
-rwxrwxr-x 1 mag mag 177 Jul 1 12:13 rename.pl
drwxrwxr-x 2 mag mag 2 Jun 20 09:19 tmp
thus no resources program does not use (see the top command output):
top - 12:20:50 up 13 days, 20:42, 4 users, load average: 4.01, 4.17, 4.21
Tasks: 1432 total, 4 running, 1428 sleeping, 0 stopped, 0 zombie
Cpu(s): 3.7%us, 3.0%sy, 0.0%ni, 93.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 1045715376k total, 843862708k used, 201852668k free, 408892k buffers
Swap: 4194300k total, 0k used, 4194300k free, 41605564k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
7165 mag 20 0 236g 221g 768 R 99.9 22.2 140:53.23 reads_clean
12150 mag 20 0 1365m 1.0g 708 R 99.9 0.1 28:48.38 bwa
33397 mag 20 0 996m 835m 1244 S 99.9 0.1 6187:23 AlignGraph
49505 mag 20 0 732m 726m 692 R 99.9 0.1 7202:57 blat
13384 mag 20 0 26992 2444 1032 R 1.3 0.0 0:00.85 top
42500 mag 20 0 36.9g 189m 11m S 0.3 0.0 10:01.15 java
7164 mag 20 0 103m 1172 1000 S 0.0 0.0 0:00.00 sh
7316 mag 20 0 119m 2100 1088 S 0.0 0.0 0:00.01 sshd
7317 mag 20 0 118m 2108 1600 S 0.0 0.0 0:00.04 bash
7354 mag 20 0 133m 4036 2364 S 0.0 0.0 0:00.12 mc
7356 mag 20 0 118m 2080 1584 S 0.0 0.0 0:00.02 bash
9687 mag 20 0 126m 1652 824 S 0.0 0.0 0:00.57 screen
9688 mag 20 0 118m 2112 1604 S 0.0 0.0 0:00.37 bash
12055 mag 20 0 119m 2088 1092 S 0.0 0.0 0:00.00 sshd
12056 mag 20 0 120m 4188 1632 S 0.0 0.0 0:00.17 bash
13223 mag 20 0 119m 2096 1088 S 0.0 0.0 0:00.05 sshd
13224 mag 20 0 120m 4204 1636 S 0.0 0.0 0:00.28 bash
25307 mag 20 0 126m 1552 824 S 0.0 0.0 0:00.03 screen
25308 mag 20 0 118m 2076 1600 S 0.0 0.0 0:00.05 bash
33396 mag 20 0 103m 1180 1008 S 0.0 0.0 0:00.00 run.sh
36845 mag 20 0 126m 1612 824 S 0.0 0.0 0:00.14 screen
36846 mag 20 0 118m 2080 1596 S 0.0 0.0 0:00.04 bash
41916 mag 20 0 103m 1168 1000 S 0.0 0.0 0:00.01 run_all.sh
41917 mag 20 0 139m 10m 2072 S 0.0 0.0 0:31.11 assembler.pl
49503 mag 20 0 103m 1168 1000 S 0.0 0.0 0:00.06 sh
60706 mag 20 0 6196m 6.0g 3736 S 0.0 0.6 1125:07 python
how long it may take the construction of the network?
Dear Misha, the network construction phase should be the fastest part of the algorithm, indeed something happened. Please, interrupt the script with Ctrl-C and send me the traceback. I'm sure we will fix this, thanks for your help. Emanuele
On Wed, Jul 1, 2015 at 8:22 AM, Misha Genaev notifications@github.com wrote:
I rename my input fasta files by perl script:
!/usr/bin/perl
my $i = 0; open F, "$ARGV[0]" || die $!; while (
) { if (/^>/) { $i++; chomp; s/^>//; s/\s//g; print ">${i} $_\n"; } else {print} } after that alignment step completed without errors.
Now in progress log I see:
INPUT FILE:L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename.fasta
Running MUMmer...1: PREPARING DATA 2,3: RUNNING mummer AND CREATING CLUSTERS
reading input file "L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename_Ailuropoda_melanoleuca.ntref" of length 2982393815
construct suffix tree for sequence of length 2982393815
(maximum reference length is 2305843009213693948)
(maximum query length is 18446744073709551615)
process 29823938 characters per dot
....................................................................................................
CONSTRUCTIONTIME /home/mag/prog/MUMmer3.23/mummer L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename_Ailuropoda_melanoleuca.ntref 4193.38
reading input file "/datapool/mag/medusa/reference_genomes/Ailuropoda_melanoleuca.ailMel1.dna.toplevel_rename.fa" of length 2299590481
matching query-file "/datapool/mag/medusa/reference_genomes/Ailuropoda_melanoleuca.ailMel1.dna.toplevel_rename.fa"
against subject-file "L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename_Ailuropoda_melanoleuca.ntref"
COMPLETETIME /home/mag/prog/MUMmer3.23/mummer L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename_Ailuropoda_melanoleuca.ntref 13025.02
SPACE /home/mag/prog/MUMmer3.23/mummer L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename_Ailuropoda_melanoleuca.ntref 5126.50
4: FINISHING DATA 1: PREPARING DATA 2,3: RUNNING mummer AND CREATING CLUSTERS
reading input file "L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename_Canis_familiaris_rename.ntref" of length 2982393815
construct suffix tree for sequence of length 2982393815
(maximum reference length is 2305843009213693948)
(maximum query length is 18446744073709551615)
process 29823938 characters per dot
....................................................................................................
CONSTRUCTIONTIME /home/mag/prog/MUMmer3.23/mummer L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename_Canis_familiaris_rename.ntref 4915.44
reading input file "/datapool/mag/medusa/reference_genomes/Canis_familiaris_rename.fa" of length 2327634022
matching query-file "/datapool/mag/medusa/reference_genomes/Canis_familiaris_rename.fa"
against subject-file "L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename_Canis_familiaris_rename.ntref"
COMPLETETIME /home/mag/prog/MUMmer3.23/mummer L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename_Canis_familiaris_rename.ntref 10456.33
SPACE /home/mag/prog/MUMmer3.23/mummer L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename_Canis_familiaris_rename.ntref 5148.45
4: FINISHING DATA
done.
Building the network...
and nothing has changed for several days.
in the directory where the program is running, I see the following files:
$ ls -l -h total 6.4G -rw-rw-r-- 1 mag mag 2.9G Jun 24 20:48 L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename.fasta -rw-rw-r-- 1 mag mag 122M Jun 25 07:55 L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename_Ailuropoda_melanoleuca.coords -rw-rw-r-- 1 mag mag 171M Jun 25 07:54 L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename_Ailuropoda_melanoleuca.delta -rw-rw-r-- 1 mag mag 259M Jun 29 04:40 L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename_Canis_familiaris_rename.coords -rw-rw-r-- 1 mag mag 135M Jun 29 04:40 L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename_Canis_familiaris_rename.delta -rw-rw-r-- 1 mag mag 2.9G Jun 23 12:11 L_RNA_scaffolder_20150415-aligngraph-lastz-blat.fasta -rw-rw-r-- 1 mag mag 0 Jun 24 21:06 medusa.err -rw-rw-r-- 1 mag mag 2.9K Jun 29 04:40 medusa.out lrwxrwxrwx 1 mag mag 37 Jun 19 18:48 medusa_scripts -> /home/mag/prog/medusa/medusa_scripts/ drwxrwxr-x 2 mag mag 4 Jun 24 21:04 reference_genomes -rwxrwxr-x 1 mag mag 177 Jul 1 12:13 rename.pl drwxrwxr-x 2 mag mag 2 Jun 20 09:19 tmp
thus no resources program does not use (see the top command output):
top - 12:20:50 up 13 days, 20:42, 4 users, load average: 4.01, 4.17, 4.21 Tasks: 1432 total, 4 running, 1428 sleeping, 0 stopped, 0 zombie Cpu(s): 3.7%us, 3.0%sy, 0.0%ni, 93.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 1045715376k total, 843862708k used, 201852668k free, 408892k buffers Swap: 4194300k total, 0k used, 4194300k free, 41605564k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 7165 mag 20 0 236g 221g 768 R 99.9 22.2 140:53.23 reads_clean 12150 mag 20 0 1365m 1.0g 708 R 99.9 0.1 28:48.38 bwa 33397 mag 20 0 996m 835m 1244 S 99.9 0.1 6187:23 AlignGraph 49505 mag 20 0 732m 726m 692 R 99.9 0.1 7202:57 blat 13384 mag 20 0 26992 2444 1032 R 1.3 0.0 0:00.85 top 42500 mag 20 0 36.9g 189m 11m S 0.3 0.0 10:01.15 java 7164 mag 20 0 103m 1172 1000 S 0.0 0.0 0:00.00 sh 7316 mag 20 0 119m 2100 1088 S 0.0 0.0 0:00.01 sshd 7317 mag 20 0 118m 2108 1600 S 0.0 0.0 0:00.04 bash 7354 mag 20 0 133m 4036 2364 S 0.0 0.0 0:00.12 mc 7356 mag 20 0 118m 2080 1584 S 0.0 0.0 0:00.02 bash 9687 mag 20 0 126m 1652 824 S 0.0 0.0 0:00.57 screen 9688 mag 20 0 118m 2112 1604 S 0.0 0.0 0:00.37 bash 12055 mag 20 0 119m 2088 1092 S 0.0 0.0 0:00.00 sshd 12056 mag 20 0 120m 4188 1632 S 0.0 0.0 0:00.17 bash 13223 mag 20 0 119m 2096 1088 S 0.0 0.0 0:00.05 sshd 13224 mag 20 0 120m 4204 1636 S 0.0 0.0 0:00.28 bash 25307 mag 20 0 126m 1552 824 S 0.0 0.0 0:00.03 screen 25308 mag 20 0 118m 2076 1600 S 0.0 0.0 0:00.05 bash 33396 mag 20 0 103m 1180 1008 S 0.0 0.0 0:00.00 run.sh 36845 mag 20 0 126m 1612 824 S 0.0 0.0 0:00.14 screen 36846 mag 20 0 118m 2080 1596 S 0.0 0.0 0:00.04 bash 41916 mag 20 0 103m 1168 1000 S 0.0 0.0 0:00.01 run_all.sh 41917 mag 20 0 139m 10m 2072 S 0.0 0.0 0:31.11 assembler.pl 49503 mag 20 0 103m 1168 1000 S 0.0 0.0 0:00.06 sh 60706 mag 20 0 6196m 6.0g 3736 S 0.0 0.6 1125:07 python
how long it may take the construction of the network?
— Reply to this email directly or view it on GitHub https://github.com/combogenomics/medusa/issues/4#issuecomment-117447434.
if I do, will I be able to start immediately with the construction of the network, bypassing the stage of alignment?
No, but I think we can work around that. The jar is basically a wrapper for the python scripts, so you can call each of the scripts to have the pipeline.
Meanwhile, feel free to kill it, since it is surely stucked. Let me know what you get. Ema
On Wed, Jul 1, 2015 at 10:04 AM, Misha Genaev notifications@github.com wrote:
if I do, will I be able to start immediately with the construction of the network, bypassing the stage of alignment?
— Reply to this email directly or view it on GitHub https://github.com/combogenomics/medusa/issues/4#issuecomment-117518100.
I kill the medusa by Ctrl-C, but I did not see traceback messages
[mag@smp medusa]$ java -jar ~/prog/medusa/medusa.jar -f reference_genomes/ -i L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename.fasta -o L_RNA_scaffolder_20150415-aligngraph-lastz-blat-medusa.fasta -v 1>medusa.out 2>medusa.err
^C[mag@smp medusa]$ cat medusa.err
[mag@smp medusa]$ cat medusa.out
INPUT FILE:L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename.fasta
------------------------------------------------------------------------------------------------------------------------
Running MUMmer...1: PREPARING DATA
2,3: RUNNING mummer AND CREATING CLUSTERS
# reading input file "L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename_Ailuropoda_melanoleuca.ntref" of length 2982393815
# construct suffix tree for sequence of length 2982393815
# (maximum reference length is 2305843009213693948)
# (maximum query length is 18446744073709551615)
# process 29823938 characters per dot
#....................................................................................................
# CONSTRUCTIONTIME /home/mag/prog/MUMmer3.23/mummer L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename_Ailuropoda_melanoleuca.ntref 4193.38
# reading input file "/datapool/mag/medusa/reference_genomes/Ailuropoda_melanoleuca.ailMel1.dna.toplevel_rename.fa" of length 2299590481
# matching query-file "/datapool/mag/medusa/reference_genomes/Ailuropoda_melanoleuca.ailMel1.dna.toplevel_rename.fa"
# against subject-file "L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename_Ailuropoda_melanoleuca.ntref"
# COMPLETETIME /home/mag/prog/MUMmer3.23/mummer L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename_Ailuropoda_melanoleuca.ntref 13025.02
# SPACE /home/mag/prog/MUMmer3.23/mummer L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename_Ailuropoda_melanoleuca.ntref 5126.50
4: FINISHING DATA
1: PREPARING DATA
2,3: RUNNING mummer AND CREATING CLUSTERS
# reading input file "L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename_Canis_familiaris_rename.ntref" of length 2982393815
# construct suffix tree for sequence of length 2982393815
# (maximum reference length is 2305843009213693948)
# (maximum query length is 18446744073709551615)
# process 29823938 characters per dot
#....................................................................................................
# CONSTRUCTIONTIME /home/mag/prog/MUMmer3.23/mummer L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename_Canis_familiaris_rename.ntref 4915.44
# reading input file "/datapool/mag/medusa/reference_genomes/Canis_familiaris_rename.fa" of length 2327634022
# matching query-file "/datapool/mag/medusa/reference_genomes/Canis_familiaris_rename.fa"
# against subject-file "L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename_Canis_familiaris_rename.ntref"
# COMPLETETIME /home/mag/prog/MUMmer3.23/mummer L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename_Canis_familiaris_rename.ntref 10456.33
# SPACE /home/mag/prog/MUMmer3.23/mummer L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename_Canis_familiaris_rename.ntref 5148.45
4: FINISHING DATA
done.
------------------------------------------------------------------------------------------------------------------------
Building the network...Traceback (most recent call last):
Ok, that's strange. to build the network you can use the netcon_mummer.py script. It takes as input the target genome and the directory with the .coords files. Try to launch it and, if it's taking long, I will figure out which are the time limiting steps. I just realized you had very big fasta files (that one of 2.9 Gb! are you sure that it isn't a read file?). Let's see what happen, thanks for your patiente. Emanuele
On Wed, Jul 1, 2015 at 10:54 AM, Misha Genaev notifications@github.com wrote:
I kill the medusa by Ctrl-C, but I did not see traceback messages
[mag@smp medusa]$ java -jar ~/prog/medusa/medusa.jar -f reference_genomes/ -i L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename.fasta -o L_RNA_scaffolder_20150415-aligngraph-lastz-blat-medusa.fasta -v 1>medusa.out 2>medusa.err ^C[mag@smp medusa]$ cat medusa.err [mag@smp medusa]$ cat medusa.out
INPUT FILE:L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename.fasta
Running MUMmer...1: PREPARING DATA 2,3: RUNNING mummer AND CREATING CLUSTERS
reading input file "L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename_Ailuropoda_melanoleuca.ntref" of length 2982393815
construct suffix tree for sequence of length 2982393815
(maximum reference length is 2305843009213693948)
(maximum query length is 18446744073709551615)
process 29823938 characters per dot
....................................................................................................
CONSTRUCTIONTIME /home/mag/prog/MUMmer3.23/mummer L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename_Ailuropoda_melanoleuca.ntref 4193.38
reading input file "/datapool/mag/medusa/reference_genomes/Ailuropoda_melanoleuca.ailMel1.dna.toplevel_rename.fa" of length 2299590481
matching query-file "/datapool/mag/medusa/reference_genomes/Ailuropoda_melanoleuca.ailMel1.dna.toplevel_rename.fa"
against subject-file "L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename_Ailuropoda_melanoleuca.ntref"
COMPLETETIME /home/mag/prog/MUMmer3.23/mummer L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename_Ailuropoda_melanoleuca.ntref 13025.02
SPACE /home/mag/prog/MUMmer3.23/mummer L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename_Ailuropoda_melanoleuca.ntref 5126.50
4: FINISHING DATA 1: PREPARING DATA 2,3: RUNNING mummer AND CREATING CLUSTERS
reading input file "L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename_Canis_familiaris_rename.ntref" of length 2982393815
construct suffix tree for sequence of length 2982393815
(maximum reference length is 2305843009213693948)
(maximum query length is 18446744073709551615)
process 29823938 characters per dot
....................................................................................................
CONSTRUCTIONTIME /home/mag/prog/MUMmer3.23/mummer L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename_Canis_familiaris_rename.ntref 4915.44
reading input file "/datapool/mag/medusa/reference_genomes/Canis_familiaris_rename.fa" of length 2327634022
matching query-file "/datapool/mag/medusa/reference_genomes/Canis_familiaris_rename.fa"
against subject-file "L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename_Canis_familiaris_rename.ntref"
COMPLETETIME /home/mag/prog/MUMmer3.23/mummer L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename_Canis_familiaris_rename.ntref 10456.33
SPACE /home/mag/prog/MUMmer3.23/mummer L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename_Canis_familiaris_rename.ntref 5148.45
4: FINISHING DATA
done.
Building the network...Traceback (most recent call last):
— Reply to this email directly or view it on GitHub https://github.com/combogenomics/medusa/issues/4#issuecomment-117547700.
Ok. I try run the netcon_mummer.py without params and get error:
[mag@smp medusa]$ pwd
/home/mag/data/medusa
[mag@smp medusa]$ python --version
Python 3.4.3
[mag@smp medusa]$ python ./medusa_scripts/netcon_mummer.py
Traceback (most recent call last):
File "./medusa_scripts/netcon_mummer.py", line 1, in <module>
from mummer_parser import *
File "/home/mag/prog/medusa/medusa_scripts/mummer_parser.py", line 73
if ((max(a) > max(b)) and (min(a) < min(b))): print '%s maps within %s !!!' %(hit2.name, hit1.name)
^
SyntaxError: invalid syntax
We are working with the genomes of mammals, so the fasta files are quite large.
Ok the error is due to python 3, i'm pushing a fixed version asap.
For the mammalian genomes, how many contigs do you have? How many of these are short (<1000bp or 500 bp)? This is just to have an idea about the computational effort. I'm thinking that medusa was running fine and that it scales bad with large networks (huge number of nodes). If it's not a problem, could you send me your files so I can do some experiment to remove the bottleneck? I'm sorry that this is taking so long, but this is the very first time we have an issue like this.
Hope to help you, Emanuele
On Wed, Jul 1, 2015 at 11:30 AM, Misha Genaev notifications@github.com wrote:
We are working with the genomes of mammals, so the fasta files are quite large.
— Reply to this email directly or view it on GitHub https://github.com/combogenomics/medusa/issues/4#issuecomment-117561138.
contigs (>= 0 bp) - 1265765 contigs (>= 1000 bp) - 151511
Unfortunately I can provide you with contigs, because these data have not yet been published, but I can give you "_.coords" and "._delta" files. whether it is enough for you to experiment?
which version of python should I use? I try python2.6 and get error too:
$ /usr/bin/python2.6 ./medusa_scripts/netcon_mummer.py
File "./medusa_scripts/netcon_mummer.py", line 165
G[n1][n2]['orientation_max']=list({tuple(i) for i in G[n1][n2]['orientation'] if G[n1][n2]['orientation'].count(i)==max_count})
^
SyntaxError: invalid syntax
Python 2.7 works fine. Python >=3 should work after my fix. About the data, could you send me i) the coords files and ii) a toy contig file with just the headers and a line of sequence (can also be a single character)? Emanuele
On Wed, Jul 1, 2015 at 11:51 AM, Misha Genaev notifications@github.com wrote:
which version of python should I use? I try python2.6 and get error too:
$ /usr/bin/python2.6 ./medusa_scripts/netcon_mummer.py File "./medusa_scripts/netcon_mummer.py", line 165 G[n1][n2]['orientation_max']=list({tuple(i) for i in G[n1][n2]['orientation'] if G[n1][n2]['orientation'].count(i)==max_count}) ^ SyntaxError: invalid syntax
— Reply to this email directly or view it on GitHub https://github.com/combogenomics/medusa/issues/4#issuecomment-117570810.
Ok. I have prepared the data http://pixie.bionet.nsc.ru/trash/medusa_data.tar.bz2 ~80MB
Thank you, I will let you know asap. I pushed the python 3 fix, but I think I can handle this by myself at this point. Sorry for the problems, but I'd say that the tool wasn't conceived for such large genomes. Indeed, the assembly high fragmentation doesn't help, I hope to come out with an optimized version very soon. Emanuele
On Wed, Jul 1, 2015 at 12:19 PM, Misha Genaev notifications@github.com wrote:
Ok. I have prepared the data http://pixie.bionet.nsc.ru/trash/medusa_data.tar.bz2 ~80MB
— Reply to this email directly or view it on GitHub https://github.com/combogenomics/medusa/issues/4#issuecomment-117582963.
Ok, I detected largely inefficient bits of code, I'm working on that and will push a new version asap. I'm closing the issue.
in the last four days I have been running the script netcon_mummer.py and today I noticed that it has successfully completed its work. what should I do next to get a scaffolds?
$ /usr/local/bin/python2.7 ./medusa_scripts/netcon_mummer.py -i L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename.fasta -f . -o L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename.graph
$ ls -l L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename.graph
-rw-rw-r-- 1 mag mag 291851658 Jul 6 00:41 L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename.graph
Dear Misha, sorry for the late reply. I've modified the java part to skip the first steps and start directly from a scaffolding network. Would you try to do the following:
Try this. If you see that it start doing the mapping, that means we have some errors. Quit the program and tell me more about that. I hope everything will run fine, let me know. I started a new branch with these changes named skip-mapping. Emanuele
On Mon, Jul 6, 2015 at 7:36 AM, Misha Genaev notifications@github.com wrote:
in the last four days I have been running the script netcon_mummer.py and today I noticed that it has successfully completed its work. what should I do next to get a scaffolds?
$ /usr/local/bin/python2.7 ./medusa_scripts/netcon_mummer.py -i L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename.fasta -f . -o L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename.graph $ ls -l L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename.graph -rw-rw-r-- 1 mag mag 291851658 Jul 6 00:41 L_RNA_scaffolder_20150415-aligngraph-lastz-blat-rename.graph
— Reply to this email directly or view it on GitHub https://github.com/combogenomics/medusa/issues/4#issuecomment-118729156.
a few days ago I made similar changes in the code, comment out some lines. but I did not understand, how do I get the new Scaffolder.java that you sent me?
Hi, You can download it from github on the branch skip-mapping. Have you tried that? Emanuele
On Wed, Jul 8, 2015 at 6:59 AM, Misha Genaev notifications@github.com wrote:
a few days ago I made similar changes in the code, comment out some lines. but I did not understand, how do I get the new file that you sent me?
— Reply to this email directly or view it on GitHub https://github.com/combogenomics/medusa/issues/4#issuecomment-119433994.
Hi,
Following Emanuele suggestion, you can find the code here: https://github.com/combogenomics/medusa/tree/skip-mapping The tarball for the code is here: https://github.com/combogenomics/medusa/archive/skip-mapping.tar.gz And I've compiled the jar for you just in case: www.ebi.ac.uk/~marco/medusa.jar
Marco
I launched the program and after several days of work got the error:
I check that reference not contain non-unique header: how should I fix this?
Moreover, I see the file "nucmer.error" which contains