PAiewsakun / GRAViTy

18 stars 8 forks source link

ValueError: Sequences must all be the same length #11

Open swmuyu opened 2 years ago

swmuyu commented 2 years ago

Hi, I run GRAViTy_Pipeline_I (example) with the following options: GRAViTy_Pipeline_I \ --GenomeDescTableFile "./Test/Data/Ref/VMR_Test_Ref.txt" \ --ShelveDir "./Test/Analysis/Ref/VI" \ --Database "VI" \ --Database_Header "Baltimore Group" \ --TaxoGrouping_Header "Taxonomic grouping" \ --N_Bootstrap 10 --GenomeSeqFile "./Test/Data/Ref/GenomeSeqs.VI.gb"

The program seems to have been terminated at - Make protein alignments

Here is the complete output:

$ GRAViTy_Pipeline_I \
> --GenomeDescTableFile "./Test/Data/Ref/VMR_Test_Ref.txt" \
> --ShelveDir "./Test/Analysis/Ref/VI" \
> --Database "VI" \
> --Database_Header "Baltimore Group" \
> --TaxoGrouping_Header "Taxonomic grouping" \
> --N_Bootstrap 10 --GenomeSeqFile "./Test/Data/Ref/GenomeSeqs.VI.gb"
Input for ReadGenomeDescTable:
====================================================================================================

Main input
--------------------------------------------------
GenomeDescTableFile: ./Test/Data/Ref/VMR_Test_Ref.txt
ShelveDir: ./Test/Analysis/Ref/VI
Database: VI
Database_Header: Baltimore Group
TaxoGrouping_Header: Taxonomic grouping
TaxoGroupingFile: None
====================================================================================================
################################################################################
#Read the GenomeDesc table                                                     #
################################################################################
- Define dir/file paths
        to program output shelve
- Read the GenomeDesc table
- Save variables to ReadGenomeDescTable.AllGenomes.shelve
        BaltimoreList
        OrderList
        FamilyList
        SubFamList
        GenusList
        VirusNameList
        SeqIDLists
        SeqStatusList
        TaxoGroupingList
        TranslTableList
        DatabaseList
- Save variables to ReadGenomeDescTable.CompleteGenomes.shelve
        BaltimoreList
        OrderList
        FamilyList
        SubFamList
        GenusList
        VirusNameList
        SeqIDLists
        SeqStatusList
        TaxoGroupingList
        TranslTableList
        DatabaseList
Input for PPHMMDBConstruction:
====================================================================================================
Main input
--------------------------------------------------
GenomeSeqFile: ./Test/Data/Ref/GenomeSeqs.VI.gb
ShelveDir: ./Test/Analysis/Ref/VI

Protein extraction options
--------------------------------------------------
ProteinLength_Cutoff: 100
IncludeProteinsFromIncompleteGenomes: True

Protein clustering options
--------------------------------------------------
BLASTp_evalue_Cutoff: 0.001
BLASTp_PercentageIden_Cutoff: 50
BLASTp_QueryCoverage_Cutoff: 75
BLASTp_SubjectCoverage_Cutoff: 75
BLASTp_num_alignments: 1000000
BLASTp_N_CPUs: 88
MUSCLE_GapOpenCost: -3.0
MUSCLE_GapExtendCost: -0.0
ProtClustering_MCLInflation: 2

Protein alignment merging options
--------------------------------------------------
N_AlignmentMerging: 0
HHsuite_evalue_Cutoff: 1e-06
HHsuite_pvalue_Cutoff: 0.05
HHsuite_N_CPUs: 88
HHsuite_QueryCoverage_Cutoff: 85
HHsuite_SubjectCoverage_Cutoff: 85
PPHMMClustering_MCLInflation_ForAlnMerging: 5
HMMER_PPHMMDB_ForEachRoundOfPPHMMMerging: True
====================================================================================================
################################################################################
#Build a database of virus protein profile hidden Markov models (PPHMMs)       #
################################################################################
- Define dir/file paths
        to BLASTp shelve directory
                to BLASTp query file
                to BLASTp subject file
                to BLASTp output file
                to BLASTp bit score matrix file
                to protein cluster file
                to protein cluster directory
        to HMMER shelve directory
                to HMMER PPHMM directory
                to HMMER PPHMM database directory
                        to HMMER PPHMM database
        to program output shelve
- Retrieve variables
        from ReadGenomeDescTable.AllGenomes.shelve
                BaltimoreList
                OrderList
                FamilyList
                SubFamList
                GenusList
                VirusNameList
                TaxoGroupingList
                SeqIDLists
                TranslTableList
- Download GenBank file
GenomeSeqFile doesn't exist. GRAViTy is downloading the GenBank file(s)
Here are the accession numbers to be downloaded: 
M14008
AF033809
AF033808
M80216
AF033807
AF074966
AF033822
DQ237904
X03711
AF151794
AF356697
M32690
M25381
U21603
L06906
Y08851
JQ867463
M74895
MF280817
X54482
GU356395
M37980
Y00302
M10455
AY282754
M23385
M10060
AF014792
AF052723
M26927
J02207
AF033813
AY842951
M33677
AF033819
U03982
U94514
KM233624
LC094267
JQ867466
EU010385
U04327
KP143760
To download GenBank file(s), please provide your email: 610262417@qq.com
- Read GenBank file
- Extract/predict protein sequences from virus genomes, excluding proteins with lengthes <100 aa
- ALL-VERSUS-ALL BLASTp
        Make BLASTp database
        Performe ALL-VERSUS-ALL BLASTp analysis
        Save protein-protein similarity scores (BLASTp bit scores)
- Cluster protein sequences based on BLASTp bit scores, using the MCL algorithm
- Make protein alignments
Traceback (most recent call last):
  File "/ifs1/User/yuzh/miniconda3/envs/grav/bin/GRAViTy_Pipeline_I", line 956, in <module>
    main()
  File "/ifs1/User/yuzh/miniconda3/envs/grav/bin/GRAViTy_Pipeline_I", line 798, in main
    HMMER_PPHMMDB_ForEachRoundOfPPHMMMerging = str2bool(options.HMMER_PPHMMDB_ForEachRoundOfPPHMMMerging),
  File "/ifs1/User/yuzh/miniconda3/envs/grav/lib/python2.7/site-packages/GRAViTy/PPHMMDBConstruction.py", line 649, in PPHMMDBConstruction
    "AlignmentLength":AlignIO.read(AlnClusterFile, "fasta").get_alignment_length()
  File "/ifs1/User/yuzh/miniconda3/envs/grav/lib/python2.7/site-packages/Bio/AlignIO/__init__.py", line 429, in read
    first = next(iterator)
  File "/ifs1/User/yuzh/miniconda3/envs/grav/lib/python2.7/site-packages/Bio/AlignIO/__init__.py", line 376, in parse
    for a in i:
  File "/ifs1/User/yuzh/miniconda3/envs/grav/lib/python2.7/site-packages/Bio/AlignIO/__init__.py", line 279, in _SeqIO_to_alignment_iterator
    yield MultipleSeqAlignment(records, alphabet)
  File "/ifs1/User/yuzh/miniconda3/envs/grav/lib/python2.7/site-packages/Bio/Align/__init__.py", line 169, in __init__
    self.extend(records)
  File "/ifs1/User/yuzh/miniconda3/envs/grav/lib/python2.7/site-packages/Bio/Align/__init__.py", line 487, in extend
    self._append(rec, expected_length)
  File "/ifs1/User/yuzh/miniconda3/envs/grav/lib/python2.7/site-packages/Bio/Align/__init__.py", line 550, in _append
    raise ValueError("Sequences must all be the same length")
ValueError: Sequences must all be the same length
mujiezhang commented 1 year ago

hello, I am also using this software and meet the same error. Have you solved this error?