EI-CoreBioinformatics / mikado

Mikado is a lightweight Python3 pipeline whose purpose is to facilitate the identification of expressed loci from RNA-Seq data * and to select the best models in each locus.
https://mikado.readthedocs.io/en/stable/
GNU Lesser General Public License v3.0
97 stars 18 forks source link

Error: "could not convert string to float" #216

Closed Xiaofei-git closed 5 years ago

Xiaofei-git commented 5 years ago

I "pip3 install mikado" installed the mikado on our hpc cluster and used the "sample_data" to test it from command line. I got same error as "https://github.com/EI-CoreBioinformatics/mikado/issues/164". Could you direct me what is the issue?

There is warning message at the beginning "calling yaml.load() without Loader=... is deprecated", might it be a problem for the error? Also, I need to copy "plants.yaml" manually to my working directory. It might be a problem, right? Could you help me figure it out?

Thanks a lot!

Xiaofei

$ mikado configure --list list.txt --reference chr5.fas --mode permissive --scoring plants.yaml -t 8 \

--copy-scoring plants.yaml --junctions junctions.bed -bt uniprot_sprot_plants.fasta configuration.yaml

..../.local/lib/python3.6/site-packages/Mikado/configuration/configurator.py:529: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. scoring = yaml.load(scoring_file) 2019-09-23 14:30:27,523 - main - init.py:124 - ERROR - main - MainProcess - Mikado crashed, cause: 2019-09-23 14:30:27,523 - main - init.py:125 - ERROR - main - MainProcess - Malformed inputs file. Error: could not convert string to float: Traceback (most recent call last): File ".../.local/lib/python3.6/site-packages/Mikado/subprograms/configure.py", line 217, in create_config score = float(_fields[3]) ValueError: could not convert string to float:

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File ".../.local/lib/python3.6/site-packages/Mikado/init.py", line 110, in main args.func(args) File ".../.local/lib/python3.6/site-packages/Mikado/subprograms/configure.py", line 222, in create_config raise ValueError("Malformed inputs file. Error:\n{}".format(exc)) ValueError: Malformed inputs file. Error: could not convert string to float:

lucventurini commented 5 years ago

Dear @xiaofei, Thank you for reporting the bug. Regarding the yaml error message, it is due to updates in the module. I fixed it in the most recent versions of the tool, but older versions are unfortunately bound to have it. It should not affect the results, though.

As for the main error, I think I changed the format of the list. May I ask you which version have you installed? It should be 1.2, correct?

Xiaofei-git commented 5 years ago

Yes, it is v1.2.4. I used "pip3 install mikado" to install, and thought it is the latest version by the installation. How can I update the tool? Thanks a lot!

lucventurini commented 5 years ago

Dear @Xiaofei-git , I am about to release a new release candidate this afternoon. That version has the bug fixed (together with many other enhancements).

Best

Xiaofei-git commented 5 years ago

Dear @Xiaofei-git , I am about to release a new release candidate this afternoon. That version has the bug fixed (together with many other enhancements).

Best

Let me know when you finish the release. Thank you so much!

BEST,

Xiaofei

lucventurini commented 5 years ago

Dear @Xiaofei-git , the new release can be found here:

https://github.com/EI-CoreBioinformatics/mikado/releases/tag/2.0rc5

Best

Xiaofei-git commented 5 years ago

Dear @Xiaofei-git , the new release can be found here:

https://github.com/EI-CoreBioinformatics/mikado/releases/tag/2.0rc5

Best

If I'd like to install by "pip3 install mikado", how could I install the new release one? Thanks a lot! BEST

lucventurini commented 5 years ago

Dear @Xiaofei-git, I added to the release page two binary packages (the ones that finish with .whl), one for python 3.6 and another for python 3.7. Either should function by doing a

pip install mikado.whl.

If you have a different version of python or you are not on a Linux system, then please download the source package (either the tar.gz or the .zip file), unpack them, then

python setup.py bdist_wheel.

You might need to install wheel, if it complains, with pip. This will create the proper binary package in the dist/ folder, which can be installed with pip:

pip install dist/*whl

I hope this helps.

Best

Xiaofei-git commented 5 years ago

Dear @Xiaofei-git, I added to the release page two binary packages (the ones that finish with .whl), one for python 3.6 and another for python 3.7. Either should function by doing a

pip install mikado.whl.

If you have a different version of python or you are not on a Linux system, then please download the source package (either the tar.gz or the .zip file), unpack them, then

python setup.py bdist_wheel.

You might need to install wheel, if it complains, with pip. This will create the proper binary package in the dist/ folder, which can be installed with pip:

pip install dist/*whl

I hope this helps.

Best

I used "pip install --user Mikado-2.0rc5-cp36-cp36m-linux_x86_64.whl" to install it, it gave me error but showed me "successfully installed" in the end as below, could I ignor the error and move forward to use it?

Thanks a lot!

ERROR: pygenometracks 2.1 requires hicexplorer>=2.1.1, which is not installed. ERROR: hicmatrix 7 has requirement intervaltree==2.1.*, but you'll have intervaltree 3.0.2 which is incompatible. Installing collected packages: numpy, python-rapidjson, msgpack, jsonref, fastnumbers, rapidjson, Mikado Found existing installation: Mikado 1.2.4 Uninstalling Mikado-1.2.4: Successfully uninstalled Mikado-1.2.4 Successfully installed Mikado-2.0rc5 fastnumbers-2.2.1 jsonref-0.2 msgpack-0.6.2 numpy-1.17.2 python-rapidjson-0.8.0 rapidjson-1.0.0

lucventurini commented 5 years ago

Dear @Xiaofei-git , the issues that pip is complaining about regard two packages that have nothing to do with Mikado, ie pygenometracks and hicmatrix. The installation of Mikado itself should be unaffected. Please let me know how it goes.

Xiaofei-git commented 5 years ago

I am moving forward on the sample_data. When I run "makeblastdb", I got error as below, although "mikado.blast.xml.gz" was still generated, but I do't know if it does matter? I googled the error, it might be with the version of blast. Do you have any ideas?

Then, I still move forward with the generated "mikado.blast.xml.gz" for "mikado serialise". Unfortunately, another error as below, I did't get any help by googling, do you have any ideas? Also, am I right for the value of "--blast_targets" option?

So, I asked 3 questions above, hope I didn't mess it up. Thanks a lot!

$ makeblastdb -in uniprot_sprot_plants.fasta -dbtype prot -parse_seqids > blast_prepare.log Error: (1431.1) FASTA-Reader: Warning: FASTA-Reader: Ignoring FASTA modifier(s) found because the input was not expected to have any. Error: (1431.1) FASTA-Reader: Warning: FASTA-Reader: Title is very long: 1001 characters (max is 1000) Error: (1431.1) FASTA-Reader: Warning: FASTA-Reader: Title is very long: 1001 characters (max is 1000) Error: (1431.1) FASTA-Reader: Warning: FASTA-Reader: Title is very long: 1001 characters (max is 1000) Error: (1431.1) FASTA-Reader: Warning: FASTA-Reader: Title is very long: 1001 characters (max is 1000) Error: (1431.1) FASTA-Reader: Warning: FASTA-Reader: Title is very long: 1001 characters (max is 1000) Error: (1431.1) FASTA-Reader: Warning: FASTA-Reader: Title is very long: 1001 characters (max is 1000) Error: (1431.1) FASTA-Reader: Warning: FASTA-Reader: Title is very long: 1001 characters (max is 1000) Error: (1431.1) FASTA-Reader: Warning: FASTA-Reader: Title is very long: 1001 characters (max is 1000) Error: (1431.1) FASTA-Reader: Warning: FASTA-Reader: Title is very long: 1001 characters (max is 1000) Error: (1431.1) FASTA-Reader: Warning: FASTA-Reader: Title is very long: 1001 characters (max is 1000) Error: (1431.1) FASTA-Reader: Warning: FASTA-Reader: Title is very long: 1001 characters (max is 1000) Error: (1431.1) FASTA-Reader: Warning: FASTA-Reader: Title is very long: 1001 characters (max is 1000) Error: (1431.1) FASTA-Reader: Warning: FASTA-Reader: Title is very long: 1001 characters (max is 1000) Error: (1431.1) FASTA-Reader: Warning: FASTA-Reader: Ignoring FASTA modifier(s) found because the input was not expected to have any. Error: (1431.1) FASTA-Reader: Warning: FASTA-Reader: Ignoring FASTA modifier(s) found because the input was not expected to have any. Error: (1431.1) FASTA-Reader: Warning: FASTA-Reader: Ignoring FASTA modifier(s) found because the input was not expected to have any. Error: (1431.1) FASTA-Reader: Warning: FASTA-Reader: Ignoring FASTA modifier(s) found because the input was not expected to have any. Error: (1431.1) FASTA-Reader: Warning: FASTA-Reader: Ignoring FASTA modifier(s) found because the input was not expected to have any. Error: (1431.1) FASTA-Reader: Warning: FASTA-Reader: Ignoring FASTA modifier(s) found because the input was not expected to have any. Error: (1431.1) FASTA-Reader: Warning: FASTA-Reader: Ignoring FASTA modifier(s) found because the input was not expected to have any.

$ mikado serialise --json-conf configuration.yaml --xml mikado.blast.xml.gz --orfs mikado.bed --blast_targets uniprot_sprot_plants.fasta ./serialise.log 2019-10-01 12:25:58,113 - main - init.py:120 - ERROR - main - MainProcess - Mikado crashed, cause: 2019-10-01 12:25:58,113 - main - init.py:121 - ERROR - main - MainProcess - (sqlite3.OperationalError) disk I/O error [SQL: CREATE TABLE DB(id int);] (Background on this error at: http://sqlalche.me/e/e3q8) Traceback (most recent call last): File ".local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1244, in _execute_context cursor, statement, parameters, context File ".local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 552, in do_execute cursor.execute(statement, parameters) sqlite3.OperationalError: disk I/O error

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File ".local/lib/python3.6/site-packages/Mikado/init.py", line 106, in main args.func(args) File ".local/lib/python3.6/site-packages/Mikado/subprograms/serialise.py", line 379, in serialise load_orfs(args, logger) File ".local/lib/python3.6/site-packages/Mikado/subprograms/serialise.py", line 147, in load_orfs logger=logger) File ".local/lib/python3.6/site-packages/Mikado/serializers/orf.py", line 222, in init self.engine = connect(json_conf, logger) File ".local/lib/python3.6/site-packages/Mikado/utilities/dbutils.py", line 128, in connect DBBASE.metadata.create_all(engine, checkfirst=True) File ".local/lib/python3.6/site-packages/sqlalchemy/sql/schema.py", line 4294, in create_all ddl.SchemaGenerator, self, checkfirst=checkfirst, tables=tables File ".local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 2035, in _run_visitor with self._optional_conn_ctx_manager(connection) as conn: File "/sonas-hs/it/hpc/home/easybuild/install_prod/software/MPI/GCC/7.3.0-2.30/OpenMPI/3.1.1/Python/3.6.6/lib/python3.6/contextlib.py", line 81, in enter return next(self.gen) File ".local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 2027, in _optional_conn_ctx_manager with self._contextual_connect() as conn: File ".local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 2229, in _contextual_connect self._wrap_pool_connect(self.pool.connect, None), File ".local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 2265, in _wrap_pool_connect return fn() File ".local/lib/python3.6/site-packages/sqlalchemy/pool/impl.py", line 361, in connect return _ConnectionFairy._checkout(self, self._fairy) File ".local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 760, in _checkout fairy = _ConnectionRecord.checkout(pool) File ".local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 492, in checkout rec = pool._do_get() File ".local/lib/python3.6/site-packages/sqlalchemy/pool/impl.py", line 344, in _do_get c = self._create_connection() File ".local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 308, in _create_connection return _ConnectionRecord(self) File ".local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 437, in init self.connect(first_connect_check=True) File ".local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 639, in connect connection = pool._invoke_creator(self) File ".local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 249, in return lambda crec: creator() File ".local/lib/python3.6/site-packages/Mikado/utilities/dbutils.py", line 55, in create_connector create_database("sqlite:///{}".format(db_settings["db"])) File ".local/lib/python3.6/site-packages/sqlalchemy_utils/functions/database.py", line 584, in create_database engine.execute("CREATE TABLE DB(id int);") File ".local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 2169, in execute return connection.execute(statement, *multiparams, **params) File ".local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 982, in execute return self._executetext(object, multiparams, params) File ".local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1155, in _execute_text parameters, File ".local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1248, in _execute_context e, statement, parameters, cursor, context File ".local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1466, in _handle_dbapi_exception util.raise_from_cause(sqlalchemy_exception, exc_info) File ".local/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 398, in raise_from_cause reraise(type(exception), exception, tb=exc_tb, cause=cause) File ".local/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 152, in reraise raise value.with_traceback(tb) File ".local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1244, in _execute_context cursor, statement, parameters, context File ".local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 552, in do_execute cursor.execute(statement, parameters) sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) disk I/O error [SQL: CREATE TABLE DB(id int);] (Background on this error at: http://sqlalche.me/e/e3q8)

lucventurini commented 5 years ago

Dear @Xiaofei-git ,

regarding the first error (in makeblastdb) - this is due to some sequence titles being too long. It should not pose an issue, in my experience.

Regarding the second issue: unfortunately I think this has to do with the specifics of your system rather than with Mikado. The problem is that the SQLite database used by Mikado crashes. My suspicion is that you are working on a NFS filesystem, which will cause SQLite to fail (I am having a similar issue on some of my machines as well). I would check with your sysadmin.

I hope this helps.

Xiaofei-git commented 5 years ago

Dear @Xiaofei-git ,

regarding the first error (in makeblastdb) - this is due to some sequence titles being too long. It should not pose an issue, in my experience.

Regarding the second issue: unfortunately I think this has to do with the specifics of your system rather than with Mikado. The problem is that the SQLite database used by Mikado crashes. My suspicion is that you are working on a NFS filesystem, which will cause SQLite to fail (I am having a similar issue on some of my machines as well). I would check with your sysadmin.

I hope this helps.

Dear @lucventurini, I got the error about SQLite, because our home directories are out of space. Now, it is fixed.

lucventurini commented 5 years ago

Dear @Xiaofei-git, thank you for updating us. Glad it worked out in the end.

Xiaofei-git commented 5 years ago

However, when I run the code "$ mikado serialise --json-conf configuration.yaml --xml mikado.blast.xml.gz --orfs mikado.bed --blast_targets uniprot_sprot_plants.fasta", I got the error below, which complained that the provided ORFs do not match the transcripts provided and already present in the database. The "mikado.bed" file is in the path of "sample_data". How did you get it? In another words, how can I get the bed file for my own data? Run TransDecoder on "mikado_prepared.fasta", which was generated by "mikado prepare"?

Actually, if I don't provide "--orfs mikado.bed", it will go through successfully, just skip it for loading ORFs, "2019-10-10 14:27:41,768 - serialiser - serialise.py:155 - INFO - load_orfs - MainProcess - No ORF data provided, skipping". Why does it complain the provided ORFs do not match the transcripts provided when I used "--orfs mikado.bed"?

Another question is how should I set the value for '--blast_targets'? If I work on Sorghum, it should be fine to use "--blast_targets uniprot_sprot_plants.fasta", right? Thanks a lot!

$ more serialise.log 2019-10-10 13:51:43,460 - serialiser - serialise.py:290 - INFO - setup - MainProcess - Command line: /sonas-hs/ware/hpc/home/xwang/.local/bin /mikado serialise --json-conf configuration.yaml --xml mikado.blast.xml.gz --orfs mikado.bed --blast_targets uniprot_sprot_plants.fasta 2019-10-10 13:51:43,460 - serialiser - serialise.py:296 - INFO - setup - MainProcess - Random seed: 4170354093 2019-10-10 13:51:43,462 - serialiser - serialise.py:334 - INFO - setup - MainProcess - Using a sqlite database (location: /mnt/grid/ware/hpc/ home/data/xwang/software/mikado_2.0/mikado-2.0rc5/sample_data/mikado.db) 2019-10-10 13:51:43,462 - serialiser - serialise.py:338 - INFO - setup - MainProcess - Requested 4 threads, forcing single thread: False 2019-10-10 13:51:43,463 - serialiser - serialise.py:141 - INFO - load_orfs - MainProcess - Starting to load ORF data 2019-10-10 13:51:43,626 - serialiser - orf.py:402 - CRITICAL - __serialize_multiple_threads - MainProcess - The provided ORFs do not match th e transcripts provided and already present in the database.Please check your input files. 2019-10-10 13:51:43,627 - serialiser - serialise.py:150 - CRITICAL - load_orfs - MainProcess - Mikado serialise failed due to problems with t he input data. Please check the logs.

lucventurini commented 5 years ago

The "mikado.bed" file is in the path of "sample_data". How did you get it? In another words, how can I get the bed file for my own data? Run TransDecoder on "mikado_prepared.fasta", which was generated by "mikado prepare"?

Yes. Transdecoder or prodigal (or any other orf caller) must be run on the output of mikado prepare, not of the input transcript models. I thought I had clarified that on the error message, but I will make it more explicit. Thank you for pointing it out.

Xiaofei-git commented 5 years ago

I thought I had clarified that on the error message, but I will make it more explicit. Thank you for pointing it out.

I am sorry I feel confused. I don't think I asked the error before about "The provided ORFs do not match th e transcripts provided", did I? I also tried to googled it, but I don't find anything related to it.

lucventurini commented 5 years ago

I thought I had clarified that on the error message, but I will make it more explicit. Thank you for pointing it out.

I am sorry I feel confused. I don't think I asked the error before about "The provided ORFs do not match th e transcripts provided", did I? I also tried to googled it, but I don't find anything related to it.

That's our fault in not making the documentation clear enough. You are not the first person to make this mistake. I will clarify this in the documentation (#136), and have mikado both have a more informative error in serialise and print it out as an instruction at the end of prepare.

As I mentioned, you are not the first one to make this mistake, so it is clearly on us to clarify the procedure.

Xiaofei-git commented 5 years ago

Thank you so much! Now, it is clear to me and I run through mikado successfully. Here is the code I used for "serialise" step as below. FYI, https://github.com/EI-CoreBioinformatics/mikado/issues/114 helped me too, because I had to delete the mikado.db file before re-running Mikado serialise, otherwise it will report that 2019-10-11 11:08:24,220 - main - init.py:120 - ERROR - main - MainProcess - Mikado crashed, cause: 2019-10-11 11:08:24,220 - main - init.py:121 - ERROR - main - MainProcess - I should have serialised 124 ORFs, but 248 are present!

$ mikado serialise --json-conf configuration.yaml --xml mikado.blast.xml.gz --orfs mikado_prepared.fasta.transdecoder.bed --blast_targets uniprot_sprot_plants.fasta -–transcripts mikado_prepared.fasta

BTW, I think it might be better to update the document for serialise here https://mikado.readthedocs.io/en/latest/Tutorial/index.html#creating-the-configuration-file-for-mikado too, because I run through the manual to get familiar with Mikado.

Thank you so much for your help to walk through the errors I met!

lucventurini commented 5 years ago

No problem @Xiaofei-git , very glad I was able to assist you through this! Please let us know if you encounter any other issue.

FYI, #114 helped me too, because I had to delete the mikado.db file before re-running Mikado serialise, otherwise it will report that [..]

Yes, I can understand that. I probably should delete the database when mikado serialise errors as badly as it does when trying to use the wrong ORFs.

BTW, I think it might be better to update the document for serialise here https://mikado.readthedocs.io/en/latest/Tutorial/index.html#creating-the-configuration-file-for-mikado too, because I run through the manual to get familiar with Mikado.

Working on that, thank you!