Closed evcurran closed 1 year ago
Something like this might occur if you somehow switched Cactus and/or vg versions while running different commands of the pipeline. (this notion of metadata which it's complaining about has been evolving in both cactus and vg in recent versions)
If that's not the case, it's a bug. Cactus uses some naming conventions to keep track of metadata in the contigs, and it could be that one of your input contigs is named in such a way that it's getting confused.
Would you be able to share any of your data so I could reproduce? Ideally it'd be the smallest .vg in chrom_alignments/ (provided that's enough to trigger the error).
I can't see how different versions of cactus/vg were used during the different stages, so perhaps it's a bug. Here is the smallest vg, let me know if you need anything else!
https://drive.google.com/file/d/1pvPWzBOuECm_Duci2ealEzL_kf9-Xu-k/view?usp=sharing
I can't open this file.
ls -l scaffold_5.vg
-rw-r--r-- 1 hickey cgl 502530048 Feb 3 07:31 scaffold_5.vg
md5sum scaffold_5.vg
f812752bb4bd9f3c8e8df19bf75cb241 scaffold_5.vg
vg stats -F scaffold_5.vg
terminate called after throwing an instance of 'std::bad_alloc'
Apologies, it seems the file didn't upload properly to google drive, md5sum indicates the upload should have worked this time:
https://drive.google.com/file/d/1K3SVFvkuz29JpJanrymHJYIcshGO47Ar/view?usp=sharing
Could you please share scaffold_5.hal
?
OK, the issue is that in your input fasta files, the contigs have names like
ING_3178#1#contig_8405_1_1_1
Where cactus would only expect contig_8405_1_1_1
.
It may have worked with older version of vg, but with the newer vg in recent cactus releases, the # character is causing problems.
Unfortunately, the only work-around is to clean up the names, then start again. You can follow the procedure described here.
cactus-prepare ./seqfile --outDir pp --seqFileOnly
cactus-preprocess ./seqfile pp/seqfile --pangenome
Then work with pp/seqfile
for the rest of the pipeline.
I'll try to make this more prominent in the documentation, as well as adding an error earlier on if "#" characters are found asap. Thanks for pointing this out.
OK, that's great, thank you for your help with this, greatly appreciated!
Hi,
I see there was a similar issue to this just posted ( #915 ) but this is giving different error messages.
I ran the following command:
And it failed with a series of errors, which look like the following (this is the first occurrence of it):
Thank you for any help with this!