alekseyzimin / masurca

GNU General Public License v3.0
245 stars 35 forks source link

Unitig has no placement using v3.6.2 #26

Closed jongepier closed 6 years ago

jongepier commented 6 years ago

Hi,

I am assembling several closely related draft genomes with masurca v3.6.2 (not beta) based on illumina PE-only libs. Most assemblies finish without problems but one assembly fails to create one consensus unitig. (from runCA3 and CA/7-0-CGW/cgw.out)

ERROR:  Unitig 30217 has no placement; probably not run through consensus.
Segmentation fault (core dumped)

as I am using the official release v 3.6.2, CA attempts to fix this problem but fails (from CA/fix_unitig_consensus/unitig_failures) ../5-consensus/genome_017.err:MultiAlignUnitig()-- Unitig 30217 FAILED. Could not align fragment 1034014.

I supose fragment 1034014 fails to align and as a consequence, no consensus unitig is produced (i.e. no UTG in either version 2 or 3 of the tigStore) (from tigStore v1:)

...
FRG type R ident  18705479 container   1199472 parent   1199472 hang    217    -97 position    769    619
FRG type R ident   1011721 container         0 parent   1058008 hang    170    184 position    660   1174
FRG type R ident   1034014 container         0 parent   1472858 hang     90    189 position    660   1167
FRG type R ident  13210399 container   1011721 parent   1011721 hang     92   -272 position    752    902
FRG type R ident  13210403 container   1011721 parent   1011721 hang     92   -272 position    752    902
...

If I extract Unitig 30217 from the tigstore, manually remove fragment 1034014, replace the tigstore version 1 entry and try to generate a unitig consensus

tigStore -g genome.gkpStore -t genome.tigStore 1 -d layout -u 30217 > unitig30217.tmp
tigStore -g genome.gkpStore -t genome.tigStore 1 -R unitig30217.tmp
utgcns -g genome.gkpStore -t genome.tigStore 1 3 -u 30217

it claims to be successful

MultiAlignStore::dumpMASRfile()-- Writing 'genome.tigStore/seqDB.v002.p003.utg' partitioned.

NumColumnsInUnitigs             = 0
NumGapsInUnitigs                = 0
NumRunsOfGapsInUnitigReads      = 0
NumColumnsInContigs             = 0
NumGapsInContigs                = 0
NumRunsOfGapsInContigReads      = 0
NumAAMismatches                 = 0
NumVARRecords                   = 0
NumVARStringsWithFlankingGaps   = 0
NumUnitigRetrySuccess           = 0

Consensus finished successfully.  Bye.

but does not produce the required UTG entry in tigStore version 3 (or 2):

unitig 30217
len 0
cns 
qlt 
data.unitig_coverage_stat -4.867138
data.unitig_microhet_prob 1.000000
data.unitig_status        X
data.unitig_unique_rept   X
data.contig_status        U
data.num_frags            30
data.num_unitigs          0
FRG type R ident   1502113 container         0 parent   1214644 hang   -208   -266 position    402      0
FRG type R ident  16305263 container   1502113 parent   1502113 hang    167    -85 position    317    167
FRG type R ident  10313351 container   1502113 parent   1502113 hang    173    -79 position    323    173
FRG type R ident  14227777 container   1502113 parent   1502113 hang    192    -60 position    342    192
FRG type R ident  11548311 container   1502113 parent   1502113 hang    202    -50 position    352    202
FRG type R ident   1214644 container         0 parent   1502113 hang    208    266 position    208    668
FRG type R ident   1864621 container   1214644 parent   1214644 hang     34    -99 position    571    242
FRG type R ident   1880555 container   1214644 parent   1214644 hang     50    -87 position    258    583
FRG type R ident  18705478 container   1214644 parent   1214644 hang     62   -248 position    270    420
FRG type R ident   1304841 container         0 parent   1214644 hang     96     76 position    304    744
FRG type R ident  18891280 container   1214644 parent   1214644 hang     99   -211 position    307    457
FRG type R ident  18895288 container   1214644 parent   1214644 hang     99   -211 position    307    457
FRG type R ident   9462916 container   1214644 parent   1214644 hang    106   -204 position    464    314
FRG type R ident   1771722 container   1214644 parent   1214644 hang    110      0 position    318    668
FRG type R ident  14883894 container   1214644 parent   1214644 hang    152   -158 position    510    360
FRG type R ident  12415492 container   1214644 parent   1214644 hang    169   -141 position    527    377
FRG type R ident   1653488 container         0 parent   1304841 hang     80     14 position    758    384
FRG type R ident   1199472 container         0 parent   1653488 hang     18    108 position    402    866
FRG type R ident   1199473 container   1199472 parent   1199472 hang      0      0 position    402    866
FRG type R ident  11474300 container   1199473 parent   1199473 hang     81   -233 position    633    483
FRG type R ident  11474474 container   1199473 parent   1199473 hang     81   -233 position    633    483
FRG type R ident   1058008 container         0 parent   1199472 hang     88    124 position    990    490
FRG type R ident   3571944 container   1058008 parent   1058008 hang      9   -341 position    649    499
FRG type R ident   9092394 container   1199473 parent   1199473 hang    111   -203 position    663    513
FRG type R ident   9092814 container   1199473 parent   1199473 hang    111   -203 position    663    513
FRG type R ident  18891281 container   1058008 parent   1058008 hang     31   -319 position    671    521
FRG type R ident  18895289 container   1058008 parent   1058008 hang     31   -319 position    671    521
FRG type R ident   1472858 container         0 parent   1199472 hang    168    112 position    570    978
FRG type R ident  18705479 container   1199472 parent   1199472 hang    217    -97 position    769    619
FRG type R ident   1011721 container         0 parent   1058008 hang    170    184 position    660   1174

I am totally happy to completely delete the problematic unitig because the assembly merely serves to error correct long reads, for which I use alternative approaches in parallel. However, I don't seem to use the correct syntax and also don't know how to remove it from all versions of the tigstore (the following just prints the help function):

tigStore -g genome.gkpStore -t genome.tigStore 1 -D -u 30217

Any suggestion on how to fix the problematic unitig or kick it out completely would be much appreciated!

Thanks and best, Evelien

alekseyzimin commented 6 years ago

Hi, thank you for reporting this. the 3.2.6 version should fix most of the unitig consensus failures, but I guess there are cases that slipped through. Your fix is correct, but you have to replace the unitig both in version 1 and version 2.

MCH74 commented 6 years ago

Hi, I have the same problem. but replacing the unitig in version 1 and 2 still did not work. Is there anyway the unitig can be removed? Thanks, Mark

alekseyzimin commented 6 years ago

Yes, you can delete the unitig. Your syntax is correct, but you have to use the version of the tigStore command that is in CA6:

/CA/Linux-amd64/bin/tigStore Make sure you delete it from both v1 and v2
jongepier commented 6 years ago

The fix worked when also adding final unitig to unpartitoned version 2 tigStore see also: https://sourceforge.net/p/wgs-assembler/mailman/message/30881670/ Thanks for the help! Closing issue.

jongepier commented 6 years ago

The fix worked when also adding final unitig to unpartitoned version 2 tigStore see also: https://sourceforge.net/p/wgs-assembler/mailman/message/30881670/ Thanks for the help! Closing issue.

alexcorm commented 6 years ago

Hi, I've also the same problem but I'm stuck when I try to remove problematic unitigs using this cmd: tigStore -g genome.gkpStore -t genome.tigStore 1 -D -u 145

TigStore simply prompt the help, preceded by this error: tigStore: Unknown dump option '-D -u'

I've tried both tigStore from /CA/Linux-amd64/bin/tigStore and /CA8/Linux-amd64/bin/tigStore

Best, Alex

MCH74 commented 6 years ago

Hi Alex,

See Evelien's last post. The deleting didn't work so in the end we repaired the problematic unitig rather than deleting it. But it is important to place the repaired unitig in the relevant partition of version 2 but also in the unpartitioned format.:

See here: https://sourceforge.net/p/wgs-assembler/mailman/message/30881670/

Hope this helps, Mark

alexcorm commented 6 years ago

Hi Mark,

I've tried these solutions without success :/ That why I want to remove problematic unitigs.

I'm wondering if there is a solution to identify reads who served as reference to construct these unitigs, remove them from the fastq and rerun Masurca.

Alex