MetaSys-LISBP / IsoCor

IsoCor: Isotope Correction for mass spectrometry labeling experiments
https://isocor.readthedocs.io
GNU General Public License v3.0
24 stars 9 forks source link

Error when running IsoCor #9

Closed XiaoyangSu closed 5 years ago

XiaoyangSu commented 5 years ago

I'm trying to run IsoCor using Data_example.tsv. However, I got an error message saying " 'gbk' codec can't decode byte 0xb5 in position 23077: illegal multibyte sequence."

I'm using Windows 10 + Anaconda (Python 3.7).

pierremillard commented 5 years ago

Dear @XiaoyangSu,

thanks for your feedback! We have tried to solve this issue in the 'dev' branch by forcing the file encoding to utf-8, please could you download this branch locally, install it with:

pip install -e /path/to/IsoCor_dev

and tell us if this fix solves the problem? (let us know if you need additional help to install the dev branch on your machine)

Some information is missing to be sure of the cause of this issue. Could you provide us the following information:

Could you also run the following python commands in Anaconda and send us the outputs?

import locale
locale.getpreferredencoding()
import sys
sys.getfilesystemencoding()

Thanks for raising this issue, we will do our best to solve it quickly.

XiaoyangSu commented 5 years ago

Dear Pierre,

Thank you for your quick response. Here are my environment parameters.

locale.getpreferredencoding: 'cp936' sys.getfilesystemencoding(): 'utf-8' Path to data file: E:/Work/IsoCor2/Data_example.tsv IsoCor Version: 2.1.2

Sorry I'm not proficient in Python and I don't know how to get traceback.

After installing the dev version, I think the gbk-utf8 issue is resolved. However, I got another error message "Error tokenizing data. C error: Expected 1 fields in line 636, saw 2."

pierremillard commented 5 years ago

Thanks @XiaoyangSu ,

happy to hear the initial issue has been fixed by @gmat in the dev version!

To display traceback, you need to run isocor by typing in anaconda:

python.exe -m isocor

The traceback should appear in the anaconda console. (we will update the doc to improve the procedures to follow in case of errors)

If nothing appears, could you run isocor using the command line interface rather than the GUI? The procedure to run the CLI can be found here

Regarding the new error, when is it raised? when loading your data file, or when clicking on 'process' ?

Please, could you attach the input data files and traceback to this issue? (or send them by email at millard[at]insa-toulouse.fr if you prefer to not make them public) It looks like the format of the datafile is not correct. We will try to improve the parsing to make it transparent to users if the issue can be solved internally, otherwise an explicit error message should be raised by isocor.

XiaoyangSu commented 5 years ago

I'm using the Data_example.tsv found on your website. This error popped up when clicking "process".

I've also tried to run IsoCor2 under CLI and verbose mode. See the following command line and output. (base) C:\Users\xs137>isocorcli -M E:\XSu\IsoCor2\ -D E:\XSu\IsoCor2 -I E:\XSu\I soCor2 -t 13C -r 60000 -m 200 -f orbitrap -v E:\XSu\IsoCor2\Data_example.tsv Traceback (most recent call last): File "c:\programdata\anaconda3\lib\runpy.py", line 193, in _run_module_as_main

"__main__", mod_spec)

File "c:\programdata\anaconda3\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "C:\ProgramData\Anaconda3\Scripts\isocorcli.exe__main__.py", line 9, in

File "c:\programdata\anaconda3\lib\site-packages\isocor\ui\isocorcli.py", line 243, in start_cli process(args) File "c:\programdata\anaconda3\lib\site-packages\isocor\ui\isocorcli.py", line 29, in process baseenv.registerIsopotes(Path(args.I)) File "c:\programdata\anaconda3\lib\site-packages\isocor\ui\isocordb.py", line 62, in registerIsopotes "Isotopes database not found in:\n'{}'.".format(isotopesfile)) ValueError: Isotopes database not found in: 'E:\XSu\IsoCor2'. Is this a problem in my Python and/or Anaconda3 installation?
pierremillard commented 5 years ago

No, this is not a problem in your Python/Anaconda installation.

The command line behaves as expected. There is actually an error in the command, which raises an error. The name of the database files is missing. The command line should be:


isocorcli -M E:\XSu\IsoCor2\Metabolites.dat -D E:\XSu\IsoCor2\Derivatives.dat -I E:\XSu\IsoCor2\Isotopes.dat -t 13C -r 60000 -m 200 -f orbitrap -v E:\XSu\IsoCor2\Data_example.tsv
If you want the results to be saved into an output file, add `> E:\XSu\IsoCor2\results.txt` to the command line Could you rerun this command and post the results ? (I expect an error to be raised, hopefully the same error as the one raised when you use the GUI but with additional details, this would help solving the issue....)
XiaoyangSu commented 5 years ago

OK. Thanks. Now I have reached the same error message as the GUI.

Traceback (most recent call last): File "c:\programdata\anaconda3\lib\runpy.py", line 193, in _run_module_as_main

"__main__", mod_spec)

File "c:\programdata\anaconda3\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "C:\ProgramData\Anaconda3\Scripts\isocorcli.exe__main__.py", line 9, in

File "c:\programdata\anaconda3\lib\site-packages\isocor\ui\isocorcli.py", line 243, in start_cli process(args) File "c:\programdata\anaconda3\lib\site-packages\isocor\ui\isocorcli.py", line 29, in process baseenv.registerIsopotes(Path(args.I)) File "c:\programdata\anaconda3\lib\site-packages\isocor\ui\isocordb.py", line 64, in registerIsopotes self.dfIsotopes = pd.read_csv(fp) File "c:\programdata\anaconda3\lib\site-packages\pandas\io\parsers.py", line 6 78, in parser_f return _read(filepath_or_buffer, kwds) File "c:\programdata\anaconda3\lib\site-packages\pandas\io\parsers.py", line 4 46, in _read data = parser.read(nrows) File "c:\programdata\anaconda3\lib\site-packages\pandas\io\parsers.py", line 1 036, in read ret = self._engine.read(nrows) File "c:\programdata\anaconda3\lib\site-packages\pandas\io\parsers.py", line 1 848, in read data = self._reader.read(nrows) File "pandas\_libs\parsers.pyx", line 876, in pandas._libs.parsers.TextReader. read File "pandas\_libs\parsers.pyx", line 891, in pandas._libs.parsers.TextReader. _read_low_memory File "pandas\_libs\parsers.pyx", line 945, in pandas._libs.parsers.TextReader. _read_rows File "pandas\_libs\parsers.pyx", line 932, in pandas._libs.parsers.TextReader. _tokenize_rows File "pandas\_libs\parsers.pyx", line 2112, in pandas._libs.parsers.raise_pars er_error pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 75, saw 6
pierremillard commented 5 years ago

Great, thanks!

Apparently not exactely the same error. For the GUI, you reported Expected 1 fields in line 636, saw 2, and for the CLI Expected 1 fields in line 75, saw 6.

Could you confirm you got the same message from the GUI and the CLI when using the same input data files (databases and data to process)?

The error you report might be related to devices where character encoding is set to 'Simplified Chinese', a configuration that we did not have tested unfortunately. I have tried to implement a new fix in the dev branch. Could you download and reinstall the updated dev branch? An error might be still raised (when writing output files if the fix works), in this case could you post the new message.

XiaoyangSu commented 5 years ago

I was using different computers, so the error messages do not perfectly match. I have installed the dev version again, but the problem persist. See below for installation log and running log. Thanks!! Forgive me for the untidy outputs.

(base) C:\Users\xs137>pip install -e E:\MSSoftware\IsoCor-dev\ Obtaining file:///E:/MSSoftware/IsoCor-dev Requirement already satisfied: pandas>=0.17.1 in c:\programdata\anaconda3\lib\si te-packages (from IsoCor==2.1.2) (0.23.4) Requirement already satisfied: scipy>=0.12.1 in c:\programdata\anaconda3\lib\sit e-packages (from IsoCor==2.1.2) (1.1.0) Requirement already satisfied: python-dateutil>=2.5.0 in c:\programdata\anaconda 3\lib\site-packages (from pandas>=0.17.1->IsoCor==2.1.2) (2.7.5) Requirement already satisfied: pytz>=2011k in c:\programdata\anaconda3\lib\site- packages (from pandas>=0.17.1->IsoCor==2.1.2) (2018.7) Requirement already satisfied: numpy>=1.9.0 in c:\programdata\anaconda3\lib\site -packages (from pandas>=0.17.1->IsoCor==2.1.2) (1.15.4) Requirement already satisfied: six>=1.5 in c:\programdata\anaconda3\lib\site-pac kages (from python-dateutil>=2.5.0->pandas>=0.17.1->IsoCor==2.1.2) (1.12.0) Installing collected packages: IsoCor Found existing installation: IsoCor 2.1.2 Uninstalling IsoCor-2.1.2: Successfully uninstalled IsoCor-2.1.2 Running setup.py develop for IsoCor Successfully installed IsoCor

(base) C:\Users\xs137>isocorcli -M E:\XSu\IsoCor2\Metabolites.dat -D E:\XSu\IsoC or2\Derivatives.dat -I E:\XSu\IsoCor2\Isotopes.dat -t 13C -r 60000 -m 200 -f orb itrap -v E:\XSu\IsoCor2\Data_example.tsv Traceback (most recent call last): File "C:\ProgramData\Anaconda3\Scripts\isocorcli-script.py", line 11, in <modu le> load_entry_point('IsoCor', 'console_scripts', 'isocorcli')() File "e:\mssoftware\isocor-dev\isocor\ui\isocorcli.py", line 243, in start_cli

process(args)

File "e:\mssoftware\isocor-dev\isocor\ui\isocorcli.py", line 29, in process baseenv.registerIsopotes(Path(args.I)) File "e:\mssoftware\isocor-dev\isocor\ui\isocordb.py", line 64, in registerIso potes self.dfIsotopes = pd.read_csv(fp, encoding='utf-8') File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 6 78, in parser_f return _read(filepath_or_buffer, kwds) File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 4 46, in _read data = parser.read(nrows) File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1 036, in read ret = self._engine.read(nrows) File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1 848, in read data = self._reader.read(nrows) File "pandas_libs\parsers.pyx", line 876, in pandas._libs.parsers.TextReader. read File "pandas_libs\parsers.pyx", line 891, in pandas._libs.parsers.TextReader. _read_low_memory File "pandas_libs\parsers.pyx", line 945, in pandas._libs.parsers.TextReader. _read_rows File "pandas_libs\parsers.pyx", line 932, in pandas._libs.parsers.TextReader. _tokenize_rows File "pandas_libs\parsers.pyx", line 2112, in pandas._libs.parsers.raise_pars er_error pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 75, saw 6

(base) C:\Users\xs137>

gmat commented 5 years ago

Hi @XiaoyangSu we don't manage to get the same environment as your's. On windows 10, I get a cp1252 environment by choosing Chinese. What should I choose to get the same as you ? A cp936 one. What is the decimal separator in your environment ? Is there an a thousands separator ?

Can you send us all your input files dat, tsv and csv ?

Maybe we'll try to change the parser in pandas to python.

Thk for your help

XiaoyangSu commented 5 years ago

On my office computer, it is cp1252 encoding and it's showing the tokenizing data error. So I think this is OK for troubleshooting purposes. I don't have decimal separator nor thousand separator in my environment.

Below is my Data_example.tsv Data_example.zip

Thanks!

gmat commented 5 years ago

Thank for your reply. I still didn't manage to reproduce the error. Can you send us the output of the following command ? pip freeze

I publish a new branch to test the python parser in pandas. Can you test it instead of the dev branch ? It's named pyparser

XiaoyangSu commented 5 years ago

It turned out that my Metabolites.dat was corrupted. The issue is resolved after replacing the database file. Thanks!

pierremillard commented 5 years ago

Hi @XiaoyangSu,

ok, happy to hear the issue is solved. Could you send us the corrupted file? This would be very helpfull to find a way to catch this error and return an explicit error message to users.

pierremillard commented 5 years ago

@XiaoyangSu, which branch did you install? The dev branch? the pyparser branch? Could you confirm the master branch also works?

gmat commented 5 years ago

Hi @XiaoyangSu

I would like to eventually publish the bugfix but I don't understand what was your last actions.

Your feedback'll be very useful. Thanks you.

XiaoyangSu commented 5 years ago

Yes! The pyparser version works on my computer now. Thank you for updating it! The following are the corrupted database files. I'm not entirely sure how it happened, but I hope this can be helpful. Thanks for helping me out again!

Old Database.zip

pierremillard commented 5 years ago

fixed in release 2.1.3