bioinfologics / satsuma2

FFT cross-correlation based synteny aligner, (re)designed to make full use of parallel computing
41 stars 13 forks source link

speed up #14

Open mictadlo opened 5 years ago

mictadlo commented 5 years ago

Hi, I compare 2 assemblies (2.7 G) together but satsuma is running for more than 140 hours. By any chance, does anyone knows whether there is a way to split the data and run it on multiple nodes?

Thank you in advance,

Michal

jonwright99 commented 5 years ago

Hi Michal,

For larger genomes (>1Gb), we recommend using one chromosome of one genome as the query sequence and the entire other genome as the target sequence, and process alignments one query chromosome at a time.

Best, Jon

mictadlo commented 5 years ago

Hi Jon, How would it be possible to merge Satsuma's output files so it would be possible to create a MizBee input file?

Thank you in advance,

Michal

kushalsuryamohan commented 2 years ago

Hi @mictadlo, were you able to resolve this issue? If not, dear @jonwright99 / @bjclavijo or other developers of Satsuma2, I'm running into a similar issue here where I'm comparing 10 de novo genomes (chromosomal genome assemblies) against a reference genome. All genomes are ~1.5-1.8 Gb in size and the jobs are running for > 24 hrs.

If I were to run Satsuma2 for each chromosome, I would appreciate it if you could provide guidance on how to proceed with combining the output from Satsuma and generate a figure using the ChromosomePaint command. Here's my guess at how to proceed: 1) Generate chained outputs per reference genome chr 2) Generate block display output per chr 3) Chromosomepaint per chr 4) Repeat steps 1-3 for all reference genome chromosomes 5) Use an image editing tool such as Illustrator to combine the chromsomepaint outputs.

Here's my command per chromosome of the target reference genome:

#Chromosome 1 of target ref genome
SatsumaSynteny2 -t ref_chr1.fasta -q query_all_scaffs_1mb_longer.fasta -o . -slaves 6
BlockDisplaySatsuma -i satsuma_summary.chained.out -q query_all_scaffs_1mb_longer.fasta -t ref_chr1.fasta -s 1000000 > query_chr1_synteny_blockdisplay.txt
ChromosomePaint -i query_chr1_synteny_blockdisplay.txt -o Chrom_paint_query_vs_ref_chr1.ps

I have 10 genomes of interest so if there is an easier way to do this, that would be much appreciated.