Closed Tang-pro closed 5 months ago
The amount of memory needed cannot be predicted in advance. It depends on the sequence lengths, their similarity and alignment parameters. I am curious about your specific task: why do you need to align 180K transcripts?
Hi, @TimoLassmann I built the full-length isoforms of two species based on Pacbio. I want to compare the isoforms of these two species and use PhastCons to evaluate the conservation of the isoforms of these two species. In fact, this is just an attempt and I don’t know how to do it. Is it reasonable? And when I use this software, I get an error message due to insufficient memory.
Hi, From your description it sounds like you are attempting to align all transcripts from different genes at the same time. You should consider aligning the transcripts to reference genomes, collect transcripts mapping to homologous regions and perform your analysis on a gene by gene basis. If no reference genome is available, you could look into using high throughput unsupervised clustering approaches to group similar sequences first, then perform your analysis on the clusters one by one.
Hi, @TimoLassmann Sorry, maybe my previous statement misunderstood you. I have a reference genome, and I have also compared the transcripts to the reference genome. But my current purpose is to conduct a conservative evaluation of these isoforms. My 180,000 isoforms can be divided into seven categories, with the largest category having 80,000 or 90,000. Can Kalign support this?
Hi, @TimoLassmann
Great software. Here I am using version 2.04. I have 180,000 transcript sequences. Here is how much memory I need.
Best!