bvalot / pyMLST

whole genome MLST analysis
Other
34 stars 5 forks source link

fail building database for S. pneumoniae #35

Closed juanjo255 closed 1 week ago

juanjo255 commented 2 weeks ago

Hello developers,

Thanks for this great work.

I am trying to do cgMLST typing for S. pneumoniae using your tool given that there is not a uploaded cgMLST file for this organism to import from cgmlst.org as indicated in the documentation, therefore I tried to create a database using the command wgMLST create spneumoniae alleles_spneumoniae.fasta where alleles_spneumoniae.fasta contains all the alleles for the 1222 loci available at a cgMLST scheme in the pubMLST. However the code fails with the message showed below.

I hope you can help me.

Thanks,

Juan


  File "/Users/jjpc/miniforge3/envs/pymlst/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1910, in _execute_context
    self.dialect.do_execute(
  File "/Users/jjpc/miniforge3/envs/pymlst/lib/python3.12/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
    cursor.execute(statement, parameters)
sqlite3.DatabaseError: database disk image is malformed

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/jjpc/miniforge3/envs/pymlst/bin/wgmlst", line 10, in <module>
    sys.exit(wg())
             ^^^^
  File "/Users/jjpc/miniforge3/envs/pymlst/lib/python3.12/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jjpc/miniforge3/envs/pymlst/lib/python3.12/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/Users/jjpc/miniforge3/envs/pymlst/lib/python3.12/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jjpc/miniforge3/envs/pymlst/lib/python3.12/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jjpc/miniforge3/envs/pymlst/lib/python3.12/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jjpc/miniforge3/envs/pymlst/lib/python3.12/site-packages/pymlst/wg/commands/create.py", line 34, in cli
    mlst.create(**utils.clean_kwargs(kwargs))
  File "/Users/jjpc/miniforge3/envs/pymlst/lib/python3.12/site-packages/pymlst/wg/core.py", line 507, in create
    added = self.__database.add_core_genome(geneid, str(gene.seq), mode)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jjpc/miniforge3/envs/pymlst/lib/python3.12/site-packages/pymlst/wg/core.py", line 126, in add_core_genome
    added, seq_id = self.__add_sequence(sequence)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jjpc/miniforge3/envs/pymlst/lib/python3.12/site-packages/pymlst/wg/core.py", line 172, in __add_sequence
    res = self.connection.execute(
          ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jjpc/miniforge3/envs/pymlst/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1385, in execute
    return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jjpc/miniforge3/envs/pymlst/lib/python3.12/site-packages/sqlalchemy/sql/elements.py", line 334, in _execute_on_connection
    return connection._execute_clauseelement(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jjpc/miniforge3/envs/pymlst/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1577, in _execute_clauseelement
    ret = self._execute_context(
          ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jjpc/miniforge3/envs/pymlst/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1953, in _execute_context
    self._handle_dbapi_exception(
  File "/Users/jjpc/miniforge3/envs/pymlst/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 2134, in _handle_dbapi_exception
    util.raise_(
  File "/Users/jjpc/miniforge3/envs/pymlst/lib/python3.12/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
    raise exception
  File "/Users/jjpc/miniforge3/envs/pymlst/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1910, in _execute_context
    self.dialect.do_execute(
  File "/Users/jjpc/miniforge3/envs/pymlst/lib/python3.12/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.DatabaseError: (sqlite3.DatabaseError) database disk image is malformed
[SQL: INSERT INTO sequences (sequence) VALUES (?)]
[parameters: ('ATGAAAAATACAGGTAAACGAATTGATCTGATAGCCAATAGAAAACCGCAGAGTCAAAGGGTTTTGTATGAATTGCGAGATCGTTTGAAGAGAAATCAGTTTATACTCAATGATACCAATCCGGATATTGTCATTTCCATTGGCGGGGA ... (521 characters truncated) ... CGGTTGACAATAGCGTTTATTCTTTCCGTAATATTGAGCGTATTGAGTATCAAATCGACCATCATAAGATTCACTTTGTCGCGACTCCTAGCCATACCAGTTTCTGGAACCGTGTTAAGGATGCCTTTATCGGTGAGGTGGATGAATGA',)]
(Background on this error at: https://sqlalche.me/e/14/4xp6)```
bvalot commented 1 week ago

Hello,

I think the problems comes to your fasta files containing the sequences of differents genes. In contrary to MLST, you don't need to have all alleles for each gene but only one copy.

You can try using the genes of the reference genome for S. pneumoniae, available here. Donwload this files containing gene sequences and uncompressed it: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/001/457/635/GCF_001457635.1_NCTC7465/GCF_001457635.1_NCTC7465_cds_from_genomic.fna.gz

Create your database with this file using this command : wgMLST create -r spneumoniae GCF_001457635.1_NCTC7465_cds_from_genomic.fna

After you add your interesting genome to compared, you can restricted to coregenome using option -m

juanjo255 commented 1 week ago

It worked. Thank you very much!