Closed cifuj closed 2 years ago
Could you share the input bins with me here in a zip file?
I'm sorry, but I can't share them. Is there any other information I can provide? I tried running it with another set of bins (5 bins medium- to high-quality >95% ANI) and I have the same error.
Hm, I wonder what the problem may be. Does test-SuperPang.py
work for you?
Are these "standard" bacterial/archaeal genomes? something "strange" about them?
I have an idea of a couple of things that might be throwing that error, but it's hard to know what's really happening without being able to reproduce it myself (of course it is understandable if you can't share your data).
Let's test a thing
Edit the
/home/jcifuentes/miniconda3/envs/SuperPang/lib/python3.8/site-packages/SuperPang/lib/Assembler.py
In line 292 change
assert len(nvs_) == 2
for
if len(nvs_) != 2: print(nvs_); assert False
This will still die but will print some info about what's going wrong. You can share the output (a horrible bunch of numbers) with me.
EDIT: Fixed a typo in the code
I only obtained this output after the "Extracting forward component"
[0, 1, 2, 3]
I changed the order of the bins and SuperPang run until comp 246/718 and then it fails also after:
Extracting forward component
[0, 1, 2, 3]
Traceback (most recent call last):
File "/home/jcifuentes/miniconda3/envs/SuperPang/bin/SuperPang.py", line 291, in
This is how it look lines 291-293
for nvs_ in psets.values():
if len(nvs_) != 2: print(nvs_); assert False
return psets
Thanks for trying this. It seems like a different case of a bug I'd fixed before, but I would need more info Can you now try changing the whole get_psets function to this one?
### Extract forward component
def get_psets(nvs):
"""Get a dict with a frozenset of vertices as keys, and the two complementary non-branching paths sharing those vertices as values"""
psets = defaultdict(list)
for nv in nvs:
pset = frozenset(vertex2NBP[nv])
psets[pset].append(nv)
for fs, nvs_ in psets.items():
if len(nvs_) != 2:
print('\nERROR!', len(nvs_), fs)
print()
for nv_ in nvs_:
print(nv_, vertex2NBP[nv_])
print('\nDONE\n')
assert len(nvs_) == 2
return psets
This is the output after modifiying Assembler.py SuperPang_error_cifuj_20220407.txt
Thanks! This gives me something to work with... I'll come back to you if I need some extra stuff
Ok, try replacing the function with this
### Extract forward component
def get_psets(nvs):
"""Get a dict with a frozenset of vertices as keys, and the two complementary non-branching paths sharing those vertices as values"""
psets = defaultdict(list)
for nv in nvs:
pset = frozenset(vertex2NBP[nv])
psets[pset].append(nv)
for ps, nvs_ in psets.items():
if len(nvs_) > 2: # If more than two NBPs share the same pset, select the first two that are RC of each other
for nv1, nv2 in combinations(nvs_): # this is a temporary patch, I should look into why this happens when I get a proper example
NBP1, NBP2 = vertex2NBP[nv1], vertex2NBP[nv2]
if NBP1 == NBP2[::-1]:
psets[ps] = [nv1, nv2]
break
else:
assert False # if there is no RC pair, die anyways
for nvs_ in psets.values():
assert len(nvs_) == 2
return psets
Hopefully this'll work
I obtained the following error.
Traceback (most recent call last):
File "/home/jcifuentes/miniconda3/envs/SuperPang/bin/SuperPang.py", line 291, in
I tried with 2 different set of bins from 2 different species and I obtained the same error after a different number of "Working on comp XX/YY"
My fault, was missing an argument
### Extract forward component
def get_psets(nvs):
"""Get a dict with a frozenset of vertices as keys, and the two complementary non-branching paths sharing those vertices as values"""
psets = defaultdict(list)
for nv in nvs:
pset = frozenset(vertex2NBP[nv])
psets[pset].append(nv)
for ps, nvs_ in psets.items():
if len(nvs_) > 2: # If more than two NBPs share the same pset, select the first two that are RC of each other
for nv1, nv2 in combinations(nvs_, 2): # this is a temporary patch, I should look into why this happens when I get a proper example
NBP1, NBP2 = vertex2NBP[nv1], vertex2NBP[nv2]
if NBP1 == NBP2[::-1]:
psets[ps] = [nv1, nv2]
break
else:
assert False # if there is no RC pair, die anyways
for nvs_ in psets.values():
assert len(nvs_) == 2
return psets
Now I have a different error in the function #Find reverse complement Non-Branching Paths
Traceback (most recent call last):
File "/home/jcifuentes/miniconda3/envs/SuperPang/bin/SuperPang.py", line 291, in
Hi again! I managed to reproduce the issue with data from another user. Version 0.9.4beta1 (available in conda) has a temptative fix for this problem. Let me know if it works for you!
Thank you very much! It works with the 2 datasets I had problems before.
Glad to hear. Thanks for testing it!
Hi! I have an error after the step "identifying connected components". I installed SuperPang using conda and I am working with 4 bins (>90% compl <5% contamination) from the same species (ANI > 97%). test-SuperPang.py runs with out problems. The error occurs after running
$SuperPang.py -f genomes/*fa -t 20 -o SuperPang --force-overwrite --checkm checkm_sox.txt
Traceback (most recent call last): File "/home/jcifuentes/miniconda3/envs/SuperPang/bin/SuperPang.py", line 291, in
main(parse_args())
File "/home/jcifuentes/miniconda3/envs/SuperPang/bin/SuperPang.py", line 144, in main
contigs = Assembler(input_minimap2, args.ksize, args.threads).run(args.minlen, args.mincov, args.bubble_identity_threshold, args.genome_assignment_threshold, args.threads)
File "/home/jcifuentes/miniconda3/envs/SuperPang/lib/python3.8/site-packages/SuperPang/lib/Assembler.py", line 322, in run
psets = get_psets(comp2nvs[nc1] | comp2nvs[nc2])
File "/home/jcifuentes/miniconda3/envs/SuperPang/lib/python3.8/site-packages/SuperPang/lib/Assembler.py", line 292, in getpsets
assert len(nvs) == 2
AssertionError
The checkM file has the format Bin_Id Completeness Contamination, but I have the same error after running SuperPang with the --assume-complete flag.