Micromeda / pygenprop

A python library for programmatic usage of EBI InterPro Genome Properties.
http://pygenprop.rtfd.io/
Apache License 2.0
9 stars 4 forks source link

error when using pygenprop ##StopIteration #77

Closed durubing-jn closed 2 years ago

durubing-jn commented 3 years ago

Hi, LeeBergstrand:

There was a error when using pygenprop.

Here was the codes: pygenprop build -d genomeProperties.txt -i 9.genome_properties.tsv -o temp

Here were the log: INFO:main:Opening 9.genome_properties.tsv INFO:main:Only adding pathway annotations Traceback (most recent call last): File "/home/rstudio/miniconda2/envs/genomeproperties/bin/pygenprop", line 244, in main(cli_args) File "/home/rstudio/miniconda2/envs/genomeproperties/bin/pygenprop", line 43, in main build_micromeda_file(genome_properties_tree, sanitized_input_file_paths, output_file_path, add_proteins) File "/home/rstudio/miniconda2/envs/genomeproperties/bin/pygenprop", line 107, in build_micromeda_file results = GenomePropertiesResults(*assignments_caches, properties_tree=genome_properties_tree) File "/home/rstudio/miniconda2/envs/genomeproperties/lib/python3.6/site-packages/pygenprop/results.py", line 42, in init property_table, step_table = assignment.create_results_tables(properties_tree) File "/home/rstudio/miniconda2/envs/genomeproperties/lib/python3.6/site-packages/pygenprop/assign.py", line 342, in create_results_tables self.bootstrap_assignments(properties_tree) File "/home/rstudio/miniconda2/envs/genomeproperties/lib/python3.6/site-packages/pygenprop/assign.py", line 142, in bootstrap_assignments self.bootstrap_assignments_from_genome_property(properties_tree.root) File "/home/rstudio/miniconda2/envs/genomeproperties/lib/python3.6/site-packages/pygenprop/tree.py", line 43, in root genome_property = next(iter(self.genome_properties_dictionary.values())) StopIteration

Can you help me how to solve it?

Thanks

LeeBergstrand commented 3 years ago

@JNdurubing

  1. Try the version of Pygenprop on the development branch (https://github.com/Micromeda/pygenprop/tree/develop). This may be an issue that has been addressed before upstream. To install the dev branch you will need to download the dev branch source code and run pip install /dev_diectory (see https://stackoverflow.com/questions/41535915/python-pip-install-from-local-dir). We ran into some parsing issues with the latest version of the InterProScan TSV output that has been addressed in the dev branch.
  2. Can you send me the 9.genome_properties.tsv file? Either post it to GitHub GISTs or a Google Drive/Dropbox link.
durubing-jn commented 3 years ago

Thanks for your reply! I have installed the Pygenprop on the development branch. If I understand correctly, this package needs some changes or updates, because some commands are abandoned by python (v 3.6): /home/rstudio/miniconda2/envs/genomeproperties/lib/python3.6/site-packages/skbio/util/_testing.py:15: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead. import pandas.util.testing as pdt /home/rstudio/miniconda2/envs/genomeproperties/lib/python3.6/site-packages/pygenprop/results.py:26: FutureWarning: 'pyarrow.default_serialization_context' is deprecated as of 2.0.0 and will be removed in a future version. Use pickle or the pyarrow IPC functionality instead. serialization_context = pa.default_serialization_context() INFO:__main__:Opening 9.genome_properties.tsv INFO:__main__:Only adding pathway annotations Traceback (most recent call last): File "/home/rstudio/miniconda2/envs/genomeproperties/bin/pygenprop", line 244, in <module> main(cli_args) File "/home/rstudio/miniconda2/envs/genomeproperties/bin/pygenprop", line 43, in main build_micromeda_file(genome_properties_tree, sanitized_input_file_paths, output_file_path, add_proteins) File "/home/rstudio/miniconda2/envs/genomeproperties/bin/pygenprop", line 107, in build_micromeda_file results = GenomePropertiesResults(*assignments_caches, properties_tree=genome_properties_tree) File "/home/rstudio/miniconda2/envs/genomeproperties/lib/python3.6/site-packages/pygenprop/results.py", line 47, in __init__ property_table, step_table = assignment.create_results_tables(properties_tree) File "/home/rstudio/miniconda2/envs/genomeproperties/lib/python3.6/site-packages/pygenprop/assign.py", line 342, in create_results_tables self.bootstrap_assignments(properties_tree) File "/home/rstudio/miniconda2/envs/genomeproperties/lib/python3.6/site-packages/pygenprop/assign.py", line 142, in bootstrap_assignments self.bootstrap_assignments_from_genome_property(properties_tree.root) File "/home/rstudio/miniconda2/envs/genomeproperties/lib/python3.6/site-packages/pygenprop/tree.py", line 43, in root genome_property = next(iter(self.genome_properties_dictionary.values())) StopIteration

Here is the 9.genome_properties.tsv (part, it is so large ~200M) and genomeProperties.txt https://gist.github.com/c7064dcdfc24158f830fd1d22d467535.git

LeeBergstrand commented 3 years ago

@JNdurubing What version of the genome properties database file are you importing and how are you importing it?

https://github.com/Micromeda/pygenprop#acquiring-genome-properties-data

LeeBergstrand commented 3 years ago
FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead. import pandas.util.testing as pdt /home/rstudio/miniconda2/envs/genomeproperties/lib/python3.6/site-packages/pygenprop/results.py:26: 

FutureWarning: 'pyarrow.default_serialization_context' is deprecated as of 2.0.0 and will be removed in a future version. Use pickle or the pyarrow IPC functionality instead. 

If I understand correctly, this package needs some changes or updates, because some commands are abandoned by python (v 3.6):

These are warnings about future functionality changes about packages that pygenprop relies upon. Nothing to worry about as long as the dependency versions are locked in.

LeeBergstrand commented 3 years ago

"/home/rstudio/miniconda2/envs/genomeproperties/lib/python3.6/site-packages/pygenprop/tree.py", line 43, in root genome_property = next(iter(self.genome_properties_dictionary.values())) StopIteration

@JNdurubing This looks like there's something wrong with your genome properties database file. For example, the file is empty or broken. See: https://github.com/Micromeda/pygenprop#acquiring-genome-properties-data

What file did you use? How did you import it?

durubing-jn commented 3 years ago

I download properties database file using wget commands. I redownloaded this file (wget https://raw.githubusercontent.com/ebi-pf-team/genome-properties/master/flatfiles/genomeProperties.txt ~1.7M). Then run command: pygenprop build -d genomeProperties.txt -i 9.genome_properties.tsv -o temp. The current error is as follows:

`/home/rstudio/miniconda2/envs/genomeproperties/lib/python3.6/site-packages/skbio/util/_testing.py:15: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead. import pandas.util.testing as pdt /home/rstudio/miniconda2/envs/genomeproperties/lib/python3.6/site-packages/pygenprop/results.py:26: FutureWarning: 'pyarrow.default_serialization_context' is deprecated as of 2.0.0 and will be removed in a future version. Use pickle or the pyarrow IPC functionality instead. serialization_context = pa.default_serialization_context() INFO:main:Opening 9.genome_properties.tsv INFO:main:Only adding pathway annotations INFO:main:Writing output Micromeda file to /home/rstudio/1_fermentated_food/7.genome_analysis/temp/temp Traceback (most recent call last): File "/home/rstudio/miniconda2/envs/genomeproperties/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 3212, in _wrap_pool_connect return fn() File "/home/rstudio/miniconda2/envs/genomeproperties/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 307, in connect return _ConnectionFairy._checkout(self) File "/home/rstudio/miniconda2/envs/genomeproperties/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 767, in _checkout fairy = _ConnectionRecord.checkout(pool) File "/home/rstudio/miniconda2/envs/genomeproperties/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 425, in checkout rec = pool._do_get() File "/home/rstudio/miniconda2/envs/genomeproperties/lib/python3.6/site-packages/sqlalchemy/pool/impl.py", line 256, in _do_get return self._create_connection() File "/home/rstudio/miniconda2/envs/genomeproperties/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 253, in _create_connection return _ConnectionRecord(self) File "/home/rstudio/miniconda2/envs/genomeproperties/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 368, in init self.connect() File "/home/rstudio/miniconda2/envs/genomeproperties/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 611, in connect pool.logger.debug("Error on connect(): %s", e) File "/home/rstudio/miniconda2/envs/genomeproperties/lib/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 72, in exit with_traceback=exctb, File "/home/rstudio/miniconda2/envs/genomeproperties/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 207, in raise raise exception File "/home/rstudio/miniconda2/envs/genomeproperties/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 605, in __connect connection = pool._invoke_creator(self) File "/home/rstudio/miniconda2/envs/genomeproperties/lib/python3.6/site-packages/sqlalchemy/engine/create.py", line 578, in connect return dialect.connect(*cargs, *cparams) File "/home/rstudio/miniconda2/envs/genomeproperties/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 584, in connect return self.dbapi.connect(cargs, **cparams) sqlite3.OperationalError: unable to open database file

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/rstudio/miniconda2/envs/genomeproperties/bin/pygenprop", line 244, in main(cli_args) File "/home/rstudio/miniconda2/envs/genomeproperties/bin/pygenprop", line 43, in main build_micromeda_file(genome_properties_tree, sanitized_input_file_paths, output_file_path, add_proteins) File "/home/rstudio/miniconda2/envs/genomeproperties/bin/pygenprop", line 111, in build_micromeda_file results.to_assignment_database(engine) File "/home/rstudio/miniconda2/envs/genomeproperties/lib/python3.6/site-packages/pygenprop/results.py", line 319, in to_assignment_database Base.metadata.drop_all(engine) File "/home/rstudio/miniconda2/envs/genomeproperties/lib/python3.6/site-packages/sqlalchemy/sql/schema.py", line 4770, in drop_all ddl.SchemaDropper, self, checkfirst=checkfirst, tables=tables File "/home/rstudio/miniconda2/envs/genomeproperties/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 3078, in _run_ddl_visitor with self.begin() as conn: File "/home/rstudio/miniconda2/envs/genomeproperties/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 2994, in begin conn = self.connect(close_with_result=close_with_result) File "/home/rstudio/miniconda2/envs/genomeproperties/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 3166, in connect return self._connection_cls(self, close_with_result=close_with_result) File "/home/rstudio/miniconda2/envs/genomeproperties/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 96, in init else engine.raw_connection() File "/home/rstudio/miniconda2/envs/genomeproperties/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 3245, in raw_connection return self._wrap_pool_connect(self.pool.connect, _connection) File "/home/rstudio/miniconda2/envs/genomeproperties/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 3216, in _wrap_pool_connect e, dialect, self File "/home/rstudio/miniconda2/envs/genomeproperties/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 2070, in _handle_dbapi_exception_noconnection sqlalchemy_exception, with_traceback=excinfo[2], from=e File "/home/rstudio/miniconda2/envs/genomeproperties/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 207, in raise_ raise exception File "/home/rstudio/miniconda2/envs/genomeproperties/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 3212, in _wrap_pool_connect return fn() File "/home/rstudio/miniconda2/envs/genomeproperties/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 307, in connect return _ConnectionFairy._checkout(self) File "/home/rstudio/miniconda2/envs/genomeproperties/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 767, in _checkout fairy = _ConnectionRecord.checkout(pool) File "/home/rstudio/miniconda2/envs/genomeproperties/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 425, in checkout rec = pool._do_get() File "/home/rstudio/miniconda2/envs/genomeproperties/lib/python3.6/site-packages/sqlalchemy/pool/impl.py", line 256, in _do_get return self._create_connection() File "/home/rstudio/miniconda2/envs/genomeproperties/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 253, in _create_connection return _ConnectionRecord(self) File "/home/rstudio/miniconda2/envs/genomeproperties/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 368, in init self.connect() File "/home/rstudio/miniconda2/envs/genomeproperties/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 611, in connect pool.logger.debug("Error on connect(): %s", e) File "/home/rstudio/miniconda2/envs/genomeproperties/lib/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 72, in exit with_traceback=exctb, File "/home/rstudio/miniconda2/envs/genomeproperties/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 207, in raise raise exception File "/home/rstudio/miniconda2/envs/genomeproperties/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 605, in __connect connection = pool._invoke_creator(self) File "/home/rstudio/miniconda2/envs/genomeproperties/lib/python3.6/site-packages/sqlalchemy/engine/create.py", line 578, in connect return dialect.connect(*cargs, *cparams) File "/home/rstudio/miniconda2/envs/genomeproperties/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 584, in connect return self.dbapi.connect(cargs, **cparams) sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) unable to open database file (Background on this error at: https://sqlalche.me/e/14/e3q8) `

durubing-jn commented 3 years ago

Maybe I need to put this file in a particular path?

LeeBergstrand commented 3 years ago

@JNdurubing, I took some time running your test files and the version of the genome properties database you specified. It worked fine for me. So the problems were either caused by the installation or implementation on your end or a problem with the full-size IPR5 file. Based on the output above, the current problem is likely caused by either a permission issue or a locked SQLite file. It looks like swapping to the development branch fixed the first issue, but you may have encountered another error that is unrelated to the first.

sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) unable to open database file (Background on this error at: https://sqlalche.me/e/14/e3q8)

So Micromeda files are actually SQLite3 database files, and pygenprop uses the SQLAlchemy library to write to these files. The above error indicates that Pygenprop was having trouble writing to the Micromeda file. Two reasons come to mind.

  1. The USER running Jupyter Notebooks / RStudio does not have permission to write to the file.

or

  1. Your Micromeda file is locked. Because Micromeda files are SQLite files, they have an internal lock, so only one program can write to them at a time. If your Python code failed or if you quit it early, you may have locked the file and then never unlocked it. Micromeda doesn't write over your existing Micromeda file. Instead, it appends to it. The first failure above could have caused this issue because it might have locked the file, and then your code failed, and Pygenprop never had a chance to unlocked the file. To get around this issue, you can delete the Micromeda file manually and start again.
LeeBergstrand commented 2 years ago

Closing because no response in months.