DessimozLab / f1000_PhylogeneticTree

A tutorial on using OMA Browser and OMA Standalone for phylogenetic species trees reconstruction. Scripts referred to in the tutorial paper can be found here.
MIT License
1 stars 0 forks source link

ValueError: Sequences must all be the same length #3

Open sgarciah12 opened 9 months ago

sgarciah12 commented 9 months ago

Hello,

I have been using this pipeline for a while (thanks to the developers!) and I am very used to its scripts but this time I cannot get around this error. When running concat_alignments.py for a set of 259 phylogenetic markers I get the following error:

python3 concat_alignments.py COGs_filtered/Alignments/Trimmed/*.fasta --format-output fasta -v > COGs_259_concatenated.fasta

2024-01-08 17:25:19,245 - concat - INFO - Successfully loaded 259 alignments from 259 input files
Traceback (most recent call last):
  File "/pasteur/zeus/projets/p02/Spiro/samuel/Project_spirochetia/Project_spirochetia_6.0/Per_clade/Treponemataceae/concat_alignments.py", line 98, in <module>
    msa = concatenate(alignments)
          ^^^^^^^^^^^^^^^^^^^^^^^
  File "/pasteur/zeus/projets/p02/Spiro/samuel/Project_spirochetia/Project_spirochetia_6.0/Per_clade/Treponemataceae/concat_alignments.py", line 62, in concatenate
    msa = MultipleSeqAlignment(SeqRecord(Seq(''.join(seq_arr)), id=label)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/pasteur/appa/homes/sgarciah/.local/lib/python3.11/site-packages/Bio/Align/__init__.py", line 173, in __init__
    self.extend(records)
  File "/pasteur/appa/homes/sgarciah/.local/lib/python3.11/site-packages/Bio/Align/__init__.py", line 481, in extend
    self._append(rec, expected_length)
  File "/pasteur/appa/homes/sgarciah/.local/lib/python3.11/site-packages/Bio/Align/__init__.py", line 543, in _append
    raise ValueError("Sequences must all be the same length")
ValueError: Sequences must all be the same length

It is the first time this has happened to me and, to my understanding, the error is coming from the BioPython AlignIO module but it should not be a problem since I checked and all sequences within each alignment are actually of the same length. Any further checks I should perform?

I have also tried to run the -d option but the way it prints the output is difficult for me to guess which file is actually giving the problem, if any.

Thank you for your time and help,

Samuel

alpae commented 9 months ago

Hi Samuel,

indeed, this seems error is coming from inside biopython's AlignIO module. which version of biopython are you using? A few weeks ago biopython 1.82 was released, I haven't checked if there are compatibilities issues with this.

could you maybe test with an older dataset that previously worked. And if you cannot get it to work, could you send us the alignment files as a tarball?

Cheers Adrian

sgarciah12 commented 8 months ago

Hello Adrian,

Sorry for taking long to get back to you.

I have tried with older datasets that previously worked and indeed in those there seems to be no problem. So I think this may be an issue on my side. Nonetheless, I double-checked with other scripts that the length of my alignments is okay and I cannot figure out why this is happening.

Find here my alignment files: aligments.tar.gz

Thanks for your help and time,

Best regards