FOI-Bioinformatics / flextaxd

FlexTaxD (Flexible Taxonomy Databases) - Create, add, merge different taxonomy sources (QIIME, GTDB, NCBI and more) and create metagenomic databases (kraken2, ganon and more )
GNU General Public License v3.0
64 stars 7 forks source link

TypeError: 'int' object is not iterable #46

Closed xvtyzn closed 2 years ago

xvtyzn commented 2 years ago

Hi.

Thank you for creating this excellent tool.

I have run the workflow according to the wiki to get GTDB database for ganon. However, I am facing an error in the database clean step.

(flextaxd) [ide@tn2 data]$ flextaxd -db databases/NCBI_GTDB_merge.db --clean_database --verbose --log NCBI_GTDB_merge_log
2021-09-25 22:47:28,013 custom_taxonomy_databases [INFO ]  FlexTaxD logging initiated!
2021-09-25 22:47:28,019 ModifyTree [INFO ]  Modify Tree
2021-09-25 22:47:28,019 DatabaseConnection [INFO ]  databases/NCBI_GTDB_merge.db opened successfully.
2021-09-25 22:47:28,056 ModifyTree [INFO ]  Fetch annotated nodes
2021-09-25 22:47:28,097 ModifyTree [INFO ]  Annotated nodes: 10357
2021-09-25 22:47:28,097 ModifyTree [INFO ]  Get all links in database
2021-09-25 22:47:28,129 ModifyTree [INFO ]  Get all nodes in database
2021-09-25 22:47:28,167 ModifyTree [INFO ]  Retrieve all parents of annotated nodes
2021-09-25 22:47:28,644 ModifyTree [INFO ]  Parents added: 4321
2021-09-25 22:47:28,692 ModifyTree [INFO ]  Links to remove 3380
2021-09-25 22:47:28,692 ModifyTree [INFO ]  Nodes to remove 3380
2021-09-25 22:47:28,692 ModifyTree [INFO ]  Clean annotations related to removed nodes
2021-09-25 22:47:28,692 ModifyTree [INFO ]  Cleaning 3380 links
2021-09-25 22:47:28,692 DatabaseConnection [INFO ]  Fast clean
2021-09-25 22:47:29,763 DatabaseConnection [INFO ]  Deleting 3380 annotations!
2021-09-25 22:47:29,806 DatabaseConnection [INFO ]  Get all database nodes
2021-09-25 22:47:29,828 DatabaseConnection [INFO ]  Get all database edges
2021-09-25 22:47:29,848 DatabaseConnection [INFO ]  Get all children from root node
2021-09-25 22:47:31,573 DatabaseConnection [INFO ]  Get tree edges from children
2021-09-25 22:47:31,621 DatabaseConnection [INFO ]  Get nodes from tree edges
2021-09-25 22:47:31,631 DatabaseConnection [INFO ]  Validate parents
2021-09-25 22:47:31,639 DatabaseConnection [INFO ]  Tree statistics
                    Nodes: 14677
                    Links: 14677
                    Tree: n(14375), l(14375)
                    LinkNodes: 14696
                    Parent_ok: True

Traceback (most recent call last):
  File "/home/ide/miniconda3/envs/flextaxd/bin/flextaxd", line 10, in <module>
    sys.exit(main())
  File "/home/ide/miniconda3/envs/flextaxd/lib/python3.6/site-packages/flextaxd/custom_taxonomy_databases.py", line 262, in main
    modify_obj.clean_database(ncbi=ncbi)
  File "/home/ide/miniconda3/envs/flextaxd/lib/python3.6/site-packages/flextaxd/modules/ModifyTree.py", line 430, in clean_database
    if self.taxonomydb.validate_tree():
  File "/home/ide/miniconda3/envs/flextaxd/lib/python3.6/site-packages/flextaxd/modules/database/DatabaseConnection.py", line 275, in validate_tree
    logger.debug([self.taxonomy[x] for x in lset])
TypeError: 'int' object is not iterable

The version of python is 3.6.0 and the installation is done using miniconda. Even if I skip this step, I am facing the same error in the Merge database step.

Thanks in advance for any advise.

Keigo

davve2 commented 2 years ago

Dear Keigo,

thank you and I hope you will find the tool useful.

I would need a little bit more details of the process you went through up to this point. Did you follow the tutorial?

I observe that the number of nodes in the count and in the tree are not matching. This is probably what is causing the unspecified error. It could be that you are doing the clean after the merge? This means that the database can contain nodes with several parents or multiple lines of children that are not matching (due to the slightly different structure between NCBI and GTDB).

I would start with trying to clean the NCBI database before merging the database with GTDB.

Best, David

davve2 commented 2 years ago

Hi.

Thank you for creating this excellent tool.

I have run the workflow according to the wiki to get GTDB database for ganon. However, I am facing an error in the database clean step.

(flextaxd) [ide@tn2 data]$ flextaxd -db databases/NCBI_GTDB_merge.db --clean_database --verbose --log NCBI_GTDB_merge_log
2021-09-25 22:47:28,013 custom_taxonomy_databases [INFO ]  FlexTaxD logging initiated!
2021-09-25 22:47:28,019 ModifyTree [INFO ]  Modify Tree
2021-09-25 22:47:28,019 DatabaseConnection [INFO ]  databases/NCBI_GTDB_merge.db opened successfully.
2021-09-25 22:47:28,056 ModifyTree [INFO ]  Fetch annotated nodes
2021-09-25 22:47:28,097 ModifyTree [INFO ]  Annotated nodes: 10357
2021-09-25 22:47:28,097 ModifyTree [INFO ]  Get all links in database
2021-09-25 22:47:28,129 ModifyTree [INFO ]  Get all nodes in database
2021-09-25 22:47:28,167 ModifyTree [INFO ]  Retrieve all parents of annotated nodes
2021-09-25 22:47:28,644 ModifyTree [INFO ]  Parents added: 4321
2021-09-25 22:47:28,692 ModifyTree [INFO ]  Links to remove 3380
2021-09-25 22:47:28,692 ModifyTree [INFO ]  Nodes to remove 3380
2021-09-25 22:47:28,692 ModifyTree [INFO ]  Clean annotations related to removed nodes
2021-09-25 22:47:28,692 ModifyTree [INFO ]  Cleaning 3380 links
2021-09-25 22:47:28,692 DatabaseConnection [INFO ]  Fast clean
2021-09-25 22:47:29,763 DatabaseConnection [INFO ]  Deleting 3380 annotations!
2021-09-25 22:47:29,806 DatabaseConnection [INFO ]  Get all database nodes
2021-09-25 22:47:29,828 DatabaseConnection [INFO ]  Get all database edges
2021-09-25 22:47:29,848 DatabaseConnection [INFO ]  Get all children from root node
2021-09-25 22:47:31,573 DatabaseConnection [INFO ]  Get tree edges from children
2021-09-25 22:47:31,621 DatabaseConnection [INFO ]  Get nodes from tree edges
2021-09-25 22:47:31,631 DatabaseConnection [INFO ]  Validate parents
2021-09-25 22:47:31,639 DatabaseConnection [INFO ]  Tree statistics
                  Nodes: 14677
                  Links: 14677
                  Tree: n(14375), l(14375)
                  LinkNodes: 14696
                  Parent_ok: True

Traceback (most recent call last):
  File "/home/ide/miniconda3/envs/flextaxd/bin/flextaxd", line 10, in <module>
    sys.exit(main())
  File "/home/ide/miniconda3/envs/flextaxd/lib/python3.6/site-packages/flextaxd/custom_taxonomy_databases.py", line 262, in main
    modify_obj.clean_database(ncbi=ncbi)
  File "/home/ide/miniconda3/envs/flextaxd/lib/python3.6/site-packages/flextaxd/modules/ModifyTree.py", line 430, in clean_database
    if self.taxonomydb.validate_tree():
  File "/home/ide/miniconda3/envs/flextaxd/lib/python3.6/site-packages/flextaxd/modules/database/DatabaseConnection.py", line 275, in validate_tree
    logger.debug([self.taxonomy[x] for x in lset])
TypeError: 'int' object is not iterable

The version of python is 3.6.0 and the installation is done using miniconda. Even if I skip this step, I am facing the same error in the Merge database step.

Thanks in advance for any advise.

Keigo

Did you find a solution for this issue?

Best regards, David

xvtyzn commented 2 years ago

Dear David,

Sorry for the late reply. The steps seemed to have gone back and forth as I executed the commands one by one. Once again, I built the database from the NCBI data, cleaned it, and then merged it, and it worked.

Thanks for your help!

Best regards, Keigo