gamcil / clinker

Gene cluster comparison figure generator
MIT License
507 stars 66 forks source link

Clinker just stalls without producing output? #72

Open Arzamasov opened 3 years ago

Arzamasov commented 3 years ago

Hello,

I am running Clinker on a server (installed via conda) for five genomes (from the same genus); gbk files were downloaded from GenBank. I use the following command: clinker ./genomes/*.gbk -p results.html The problem is that after ~2.5 hours, it seems that the procedure stalls without producing any output. I found that during these 2.5 hours, python processes (one per CPU) were active, but then they stopped, and nothing happened after.
When I do not align the clusters (-naoption), everything works, and the desired output file is produced (although without cluster alignments, it not useful).

Any ideas what may cause this?

Thank you!

gamcil commented 3 years ago

How big are these genomes? The tool was designed for smaller gene clusters, and will struggle if you try to do full genomes - it aligns every protein from every genome, so I would expect a long running time/slow visualisation when it finishes. Maybe something like Mauve (http://darlinglab.org/mauve/mauve.html) may be more appropriate here?

Arzamasov commented 3 years ago

Got it, thank you! I incorrectly used Clinker then. I did use full gbk files as input (each genome ~2.5 mb). But I am more interested in aligning and visualizing a specific cluster present in those genomes. Do you have any tips for creating smaller gbk files for clusters?

gamcil commented 3 years ago

A while back I added the --ranges argument where you can specify genomic coordinates to be extracted (e.g. --ranges scaffold_1:10000-50000). Though, some people were having issues with it (https://github.com/gamcil/clinker/issues/62) and I haven't had time to go back and look, so just check if you can get it working. Otherwise, there's bound to be scripts around to extract regions from GenBank files, or you can do it through graphical software like Geneious.

xonq commented 1 year ago

to note this for others, my assumption is you are running out of RAM

hyphaltip commented 1 year ago

to extract ranges of genomes this is very easy with biopython eg here's part of one script that does this https://github.com/stajichlab/GAG_cluster_1kfg/blob/7c14b29bd3a9f1401e0896010930a3f7a5ede5f4/scripts/build_cluster_genbank.py#L155

from Bio import SeqIO
seq = SeqIO.read("GENBANKFILE.gb", "genbank")
left = 100
right = 10000
slice = seq[ left:right ] # cut a slice out
SeqIO.write(slice, "GENBANK_SLICE.gbk", "genbank")
zhihannnn commented 11 months ago

Hellow! When I specific the ranges by --ranges /clinker examples/'A. zlliaceus CBS 536.65.gbk' examples/'A. burnettii MST-FP2249.gbk' --ranges NW_022474703:1-27000 scaffole_377:1-27000 -p it still show features out of the ranges, and without warning or error.I can figure out why clinker cannot read the ranges, please help me~ Thank you!

Wanjofu commented 2 weeks ago

Hellow! When I specific the ranges by --ranges /clinker examples/'A. zlliaceus CBS 536.65.gbk' examples/'A. burnettii MST-FP2249.gbk' --ranges NW_022474703:1-27000 scaffole_377:1-27000 -p it still show features out of the ranges, and without warning or error.I can figure out why clinker cannot read the ranges, please help me~ Thank you!

I am also experiencing the same. I dont know if there is something I am missing