eead-csic-compbio / get_homologues

GET_HOMOLOGUES: a versatile software package for pan-genome analysis
Other
110 stars 26 forks source link

Error in input genbank files #111

Closed DAguileraN closed 1 year ago

DAguileraN commented 1 year ago

I downloaded some data from NCBI for my analysis and when I try to use those files I'm getting the following error: ERROR: could not extract sequences from file Alkalicoccobacillus_gibsonii_WGS.gbk I have check all the files and all of them hace complete sequences. I don't know if i miss something. Could you please help me? Thanks Diego

I attach the file here: Alkalicoccobacillus_gibsonii_WGS.zip

brunocontrerasmoreira commented 1 year ago

Hi @DAguileraN , if you check the file contents you will see it is like this:

LOCUS       JAQQWR010000001      1381484 bp    DNA     linear   BCT 13-FEB-2023 
DEFINITION  Alkalicoccobacillus gibsonii strain DSM 8722 scaffold1, whole genome shotgun sequence.
ACCESSION   JAQQWR010000001 JAQQWR010000000
KEYWORDS    WGS.
SOURCE      Alkalicoccobacillus gibsonii
ORGANISM  Alkalicoccobacillus gibsonii 
COMMENT     ##Genome-Assembly-Data-START##
        Assembly Method        :: SOAPdenovo v. 2.04
        Genome Representation  :: Full
        Expected Final Version :: No
        Genome Coverage        :: 100.0x
        Sequencing Technology  :: Illumina HiSeq
        ##Genome-Assembly-Data-END##
ORIGIN
    1 ttccacagta gctcagtggt ...

As you can see there are no features annotated in the sequence. In particular, there are no genes nor CDS objects, which are what get_homologues.pl expects. A file like this cannot be analyzed with this software, it needs to be annotated first.