AliTVTeam / AliTV

Visualize whole genome alignments as linear maps
https://alitvteam.github.io/AliTV/d3/AliTV.html
MIT License
69 stars 11 forks source link

AliTV on Cactus alignment #166

Open SimonaSecomandi opened 3 years ago

SimonaSecomandi commented 3 years ago

Hello,

I generated a CACTUS alignment between 9 bird species and I'm trying to find a way to visualize them.

Cactus is a reference-free software and outputs an HAL file that can be converted to MAF.

It is possible to visualize an alignment like that? The maf file can be referenced at the root or at any of the aligned species. By doing so, how can I visualize that alignment in AliTv ?

Many thanks

iimog commented 3 years ago

Hi! The capabilities of AliTV for importing pre-calculated alignments are somewhat limited but in general MAF import is possible: https://github.com/AliTVTeam/AliTV-perl-interface/blob/master/doc/alitv.md#section-alignment You can try to run alitv.pl as described there with your MAF alignment. Let me know if it works. We are looking to improve alignment import capabilities anyway (e.g. AliTVTeam/AliTV-perl-interface#167). So your feedback whether this works would be welcome. There might be a possibility to add HAL support as well, but I'll need to have a closer look into this.

SimonaSecomandi commented 3 years ago

Hi, I was trying to visualize the alignment between two of my species starting from the HAL file. I extracted a MAF referenced at the chicken genome, with the barn swallow genome as a target. I faced the sequence name problem as in the issue you linked to me. I resolved it by rename both the fasta files and the maf with new sequence names (e.g. GgC1 for Gallus gallus chromosome 1) and AliTV is actually running without errors.

Here's part of the output a video while running:

Use of uninitialized value in subroutine entry at /gpfs/home/projects/Hirundo/bin/AliTV-perl-interface/bin/../lib/AliTV/Alignment.pm line 262, line 2155751. Use of uninitialized value in subroutine entry at /gpfs/home/projects/Hirundo/bin/AliTV-perl-interface/bin/../lib/AliTV/Alignment.pm line 262, line 2155755. Use of uninitialized value in subroutine entry at /gpfs/home/projects/Hirundo/bin/AliTV-perl-interface/bin/../lib/AliTV/Alignment.pm line 262, line 2155759. Use of uninitialized value in subroutine entry at /gpfs/home/projects/Hirundo/bin/AliTV-perl-interface/bin/../lib/AliTV/Alignment.pm line 262, line 2155763. Use of uninitialized value in subroutine entry at /gpfs/home/projects/Hirundo/bin/AliTV-perl-interface/bin/../lib/AliTV/Alignment.pm line 262, line 2155767.

However, it has been running for almost an entire day.. is that normal?

My plan was to make a trial like so and then provide to AliTV the MAFs for each couple of genomes that are near in the phylogenetic tree (and I will provide also that) or the entire MAF file.. what do you think will work better? Can AliTV take the entire MAFs for multiple species ?

iimog commented 3 years ago

Hi, great progress! The Use of uninitialized value warnings you see indicate, that the score for the alignment could not be found. This should not cause any problems as long as identity and length are there (these would cause warnings for line 261 or 263 respectively). So I'm cautiously optimistic that the result will be a proper AliTV json. A runtime of (more than) one day is not what I would have expected. This probably means that the output will be very large. But this can be filtered later on to make it more manageable. If it is not blocking your resources (and has still not finished) I would suggest to let it run.

Regarding pairwise MAF vs entire MAF I'm not sure. As far as I know AliTV normally uses pairwise MAF files. Maybe @greatfireball can answer whether it will also handle MAFs for multiple species?

SimonaSecomandi commented 3 years ago

Hello, AliTV tooks more that 2 days and the job has been killed (there is a 2 day limit runtime).. do you know any ways to speed up the runtime?

iimog commented 3 years ago

I'm not sure what the bottleneck is in this case. Could you try subsetting the alignment file to see whether it properly finishes at all? Can you share an example file so I can run it locally and profile what causes this extreme runtime?

SimonaSecomandi commented 3 years ago

Hi, I'm sorry for the late reply but I had to pause the analysis for a while. Now I'm back on it! I subset the MAF alignment focusing on a single chromosome alignment between two species (I referenced the MAF on chromosome 6 of my reference species and targeted it on the query species). The program ran for more that 24h but it failed (see below). I passed to AliTV the entire fasta files for the two aligned genomes (and not only the chromosomes of interest). Moreover, in the yml file (AliTV_Hirundo_Gallus_SUPER_6.yml.txt), I passed to AliTV the query species first (Gallus gallus) and then the reference (Hirundo rustica).

The command: perl alitv.pl --project AliTV_Hirundo_Gallus_Chr6 AliTV_Hirundo_Gallus_SUPER_6.yml

The error:

You are using version v1.0.6. INFO - MAF input file and buggy BioPerl detected... Therefore, workaround for revcom issue activated Use of uninitialized value in subroutine entry at /gpfs/home/projects/Hirundo/bin/AliTV-perl-interface/bin/../lib/AliTV/Alignment.pm line 262, line 36. Use of uninitialized value in subroutine entry at /gpfs/home/projects/Hirundo/bin/AliTV-perl-interface/bin/../lib/AliTV/Alignment.pm line 262, line 68. Use of uninitialized value in subroutine entry at /gpfs/home/projects/Hirundo/bin/AliTV-perl-interface/bin/../lib/AliTV/Alignment.pm line 262, line 101. Use of uninitialized value in subroutine entry at /gpfs/home/projects/Hirundo/bin/AliTV-perl-interface/bin/../lib/AliTV/Alignment.pm line 262, line 134. Use of uninitialized value in subroutine entry at /gpfs/home/projects/Hirundo/bin/AliTV-perl-interface/bin/../lib/AliTV/Alignment.pm line 262, line 167. Use of uninitialized value in subroutine entry at /gpfs/home/projects/Hirundo/bin/AliTV-perl-interface/bin/../lib/AliTV/Alignment.pm line 262, line 200. Use of uninitialized value in subroutine entry at /gpfs/home/projects/Hirundo/bin/AliTV-perl-interface/bin/../lib/AliTV/Alignment.pm line 262, line 232. Use of uninitialized value in subroutine entry at /gpfs/home/projects/Hirundo/bin/AliTV-perl-interface/bin/../lib/AliTV/Alignment.pm line 262, line 264. Use of uninitialized value in subroutine entry at /gpfs/home/projects/Hirundo/bin/AliTV-perl-interface/bin/../lib/AliTV/Alignment.pm line 262, line 297. Use of uninitialized value in subroutine entry at /gpfs/home/projects/Hirundo/bin/AliTV-perl-interface/bin/../lib/AliTV/Alignment.pm line 262, line 330. Use of uninitialized value in subroutine entry at /gpfs/home/projects/Hirundo/bin/AliTV-perl-interface/bin/../lib/AliTV/Alignment.pm line 262, line 363. ...... Use of uninitialized value in subroutine entry at /gpfs/home/projects/Hirundo/bin/AliTV-perl-interface/bin/../lib/AliTV/Alignment.pm line 262, line 3182494. Use of uninitialized value in subroutine entry at /gpfs/home/projects/Hirundo/bin/AliTV-perl-interface/bin/../lib/AliTV/Alignment.pm line 262, line 3182499. Use of uninitialized value in subroutine entry at /gpfs/home/projects/Hirundo/bin/AliTV-perl-interface/bin/../lib/AliTV/Alignment.pm line 262, line 3182504. Use of uninitialized value in subroutine entry at /gpfs/home/projects/Hirundo/bin/AliTV-perl-interface/bin/../lib/AliTV/Alignment.pm line 262, line 3182509. Use of uninitialized value in subroutine entry at /gpfs/home/projects/Hirundo/bin/AliTV-perl-interface/bin/../lib/AliTV/Alignment.pm line 262, line 3182514. Use of uninitialized value in subroutine entry at /gpfs/home/projects/Hirundo/bin/AliTV-perl-interface/bin/../lib/AliTV/Alignment.pm line 262, line 3182519. Use of uninitialized value in subroutine entry at /gpfs/home/projects/Hirundo/bin/AliTV-perl-interface/bin/../lib/AliTV/Alignment.pm line 262, line 3182524. Use of uninitialized value in subroutine entry at /gpfs/home/projects/Hirundo/bin/AliTV-perl-interface/bin/../lib/AliTV/Alignment.pm line 262, line 3182530. Use of uninitialized value in subroutine entry at /gpfs/home/projects/Hirundo/bin/AliTV-perl-interface/bin/../lib/AliTV/Alignment.pm line 262, line 3182539. Use of uninitialized value in subroutine entry at /gpfs/home/projects/Hirundo/bin/AliTV-perl-interface/bin/../lib/AliTV/Alignment.pm line 262, line 3182546. Use of uninitialized value in subroutine entry at /gpfs/home/projects/Hirundo/bin/AliTV-perl-interface/bin/../lib/AliTV/Alignment.pm line 262, line 3182550. Use of uninitialized value in subroutine entry at /gpfs/home/projects/Hirundo/bin/AliTV-perl-interface/bin/../lib/AliTV/Alignment.pm line 262, line 3182553. FATAL - unable to create features at /gpfs/home/projects/Hirundo/bin/AliTV-perl-interface/bin/../lib/AliTV/Alignment.pm line 305. unable to create features at /gpfs/home/projects/Hirundo/bin/AliTV-perl-interface/bin/../lib/AliTV/Alignment.pm line 305.

Many thanks!

iimog commented 3 years ago

I'm afraid this might be due to renaming of sequence IDs AliTV needs to do internally because it calls external programs that have restrictions on what they consider valid IDs. Therefore, when reading the maf file with original IDs it tries to match them against the renamed ones and can not find them.

This seems to be the exact same problem as in https://github.com/AliTVTeam/AliTV-perl-interface/issues/167#issuecomment-806462166

As this problem affects multiple users and there is no simple workaround from a user perspective (beside manually renaming IDs and later mapping back, which is a pain) we really need to fix this in AliTV. @greatfireball what do you think: Could a --keep-original-ids parameter be a solution. That would tell AliTV to not touch the IDs of the sequences. Or is it better to apply the ID mapping to the maf import (should be possible as well, right?)?

iimog commented 3 years ago

I just pushed a branch https://github.com/AliTVTeam/AliTV-perl-interface/tree/improve-maf-import where you can call alitv.pl --keepids [...] which will prevent the mapping of fasta IDs to unique names. When you have time you can re-run AliTV with the version from that branch and report whether that fixes your problem. Note: it might be problematic if you have non-unique IDs (across files) or IDs contain special characters.

greatfireball commented 3 years ago

Sorry, I was busy for the last week. I will check your branch later today. If okay, we can merge it into our master branch as well, especially, if @SimonaSecomandi can verify, that the issue is solved with your new parameter (and its implementation).