MrOlm / drep

Rapid comparison and dereplication of genomes
250 stars 37 forks source link

pandas.io.common.EmptyDataError: No columns to parse from file #14

Closed hzafeng closed 6 years ago

hzafeng commented 7 years ago

Hallo, I can run the first Step of dRep with CheckM now,but when I go to the second module "Cluster",


..:: dRep Step 2. Cluster ::..

Step 1. Parse Arguments Step 2. Perform MASH (primary) clustering 2a. Run pair-wise MASH clustering [====================] 100.00% Traceback (most recent call last): File "/home/zjs/tools/drep/bin/dRep", line 26, in controller.parseArguments(args) File "/home/zjs/.pyenv/versions/3.5.1/lib/python3.5/site-packages/drep/controller.py", line 144, in parseArguments self.dereplicate_wf_operation(vars(args)) File "/home/zjs/.pyenv/versions/3.5.1/lib/python3.5/site-packages/drep/controller.py", line 86, in dereplicate_wf_operation drep.d_workflows.dereplicate_wrapper(kwargs['work_directory'],kwargs) File "/home/zjs/.pyenv/versions/3.5.1/lib/python3.5/site-packages/drep/d_workflows.py", line 36, in dereplicate_wrapper drep.d_cluster.d_cluster_wrapper(wd, kwargs) File "/home/zjs/.pyenv/versions/3.5.1/lib/python3.5/site-packages/drep/d_cluster.py", line 288, in d_cluster_wrapper Cdb, Mdb, Ndb = cluster_genomes(Bdb, data_folder, kwargs) File "/home/zjs/.pyenv/versions/3.5.1/lib/python3.5/site-packages/drep/d_cluster.py", line 104, in cluster_genomes Mdb = all_vs_all_MASH(Bdb, data_folder, kwargs) File "/home/zjs/.pyenv/versions/3.5.1/lib/python3.5/site-packages/drep/d_cluster.py", line 632, in all_vs_all_MASH table = pd.read_csv(file,sep='\t',header = None) File "/home/zjs/.pyenv/versions/3.5.1/lib/python3.5/site-packages/pandas/io/parsers.py", line 645, in parser_f return _read(filepath_or_buffer, kwds) File "/home/zjs/.pyenv/versions/3.5.1/lib/python3.5/site-packages/pandas/io/parsers.py", line 388, in _read parser = TextFileReader(filepath_or_buffer, kwds) File "/home/zjs/.pyenv/versions/3.5.1/lib/python3.5/site-packages/pandas/io/parsers.py", line 729, in init self._make_engine(self.engine) File "/home/zjs/.pyenv/versions/3.5.1/lib/python3.5/site-packages/pandas/io/parsers.py", line 922, in _make_engine self._engine = CParserWrapper(self.f, self.options) File "/home/zjs/.pyenv/versions/3.5.1/lib/python3.5/site-packages/pandas/io/parsers.py", line 1389, in init self._reader = _parser.TextReader(src, kwds) File "pandas/parser.pyx", line 538, in pandas.parser.TextReader.cinit (pandas/parser.c:5896) pandas.io.common.EmptyDataError: No columns to parse from file**


and if I use the parameter of "Skipmash" I can pass the step,but when I face the final Step ,it happened again:

[zjs@www drep]$ /home/zjs/tools/drep/bin/dRep evaluate ./drep_out/ -e all will compare winners [====================] 100.00% Traceback (most recent call last): File "/home/zjs/tools/drep/bin/dRep", line 26, in controller.parseArguments(args) File "/home/zjs/.pyenv/versions/3.5.1/lib/python3.5/site-packages/drep/controller.py", line 161, in parseArguments self.evaluate_operation(vars(args)) File "/home/zjs/.pyenv/versions/3.5.1/lib/python3.5/site-packages/drep/controller.py", line 81, in evaluate_operation drep.d_evaluate.d_evaluate_wrapper(kwargs['work_directory'],kwargs) File "/home/zjs/.pyenv/versions/3.5.1/lib/python3.5/site-packages/drep/d_evaluate.py", line 29, in d_evaluate_wrapper Wmdb, Wndb = compare_winners(wd,kwargs) File "/home/zjs/.pyenv/versions/3.5.1/lib/python3.5/site-packages/drep/d_evaluate.py", line 67, in compare_winners Wmdb = dClust.all_vs_all_MASH(Bdb,data_folder) File "/home/zjs/.pyenv/versions/3.5.1/lib/python3.5/site-packages/drep/d_cluster.py", line 632, in all_vs_all_MASH table = pd.read_csv(file,sep='\t',header = None) File "/home/zjs/.pyenv/versions/3.5.1/lib/python3.5/site-packages/pandas/io/parsers.py", line 645, in parser_f return _read(filepath_or_buffer, kwds) File "/home/zjs/.pyenv/versions/3.5.1/lib/python3.5/site-packages/pandas/io/parsers.py", line 388, in _read parser = TextFileReader(filepath_or_buffer, kwds) File "/home/zjs/.pyenv/versions/3.5.1/lib/python3.5/site-packages/pandas/io/parsers.py", line 729, in init self._make_engine(self.engine) File "/home/zjs/.pyenv/versions/3.5.1/lib/python3.5/site-packages/pandas/io/parsers.py", line 922, in _make_engine self._engine = CParserWrapper(self.f, self.options) File "/home/zjs/.pyenv/versions/3.5.1/lib/python3.5/site-packages/pandas/io/parsers.py", line 1389, in init self._reader = _parser.TextReader(src, kwds) File "pandas/parser.pyx", line 538, in pandas.parser.TextReader.cinit (pandas/parser.c:5896) pandas.io.common.EmptyDataError: No columns to parse from file


Could you help me for this issue?

MrOlm commented 7 years ago

Hello,

Are both Mash and mumer properly installed?

To check, please let me know the output from the following command:

$ dRep bonus test --check_dependencies

-Matt

hzafeng commented 7 years ago

Hello,Matt

This is the output of the dependency check: [zjs@www maxbin]$ /home/zjs/tools/drep/bin/dRep bonus test --check_dependencies Loading work directory Checking dependencies mash.................................... all good (location = /home/zjs/hhf/soft/Mash/bin/mash) nucmer.................................. all good (location = /home/zjs/hhf/soft/MUMmer3.23/nucmer) checkm.................................. all good (location = /home/zjs/tools/pitchfork/deployment/bin/checkm) ANIcalculator........................... all good (location = /home/zjs/hhf/soft/ANIcalculator_v1/ANIcalculator) prodigal................................ all good (location = /usr/local/bin/prodigal) centrifuge.............................. all good (location = /home/zjs/hhf/soft/centrifuge-1.0.3-beta/centrifuge)

And I still face the problem of "pandas.io.common.EmptyDataError: No columns to parse from file"

MrOlm commented 7 years ago

Hmmm... that pandas error is what happens when it tries to read an empty dataframe. So this isn't really a problem with pandas, but a sign that something with dRep failed along the way.

Would you mind sending me the log file? It's located in the log folder and is called logger.log

Thanks, -Matt

hzafeng commented 7 years ago

logger.log

MrOlm commented 7 years ago

Thank you for the log. This is strange- I'm not sure yet what the problem is.

Could you show me a list of what is in the folder:

...test_drep_out/data/MASH_files/

As well we as what is in the folder:

...test_drep_out/data/MASH_files/sketches/

?

Also, please let me know if any of the files in either of those folders is empty. I'm trying to figure out where Mash is messing up... thanks!

-Matt

hzafeng commented 7 years ago

Thank you for your prompt reply

The /MASH_files/ was empty: [zjs@www MASH_files]$ ll total 0 -rwxrwxrwx. 1 root root 0 Oct 6 10:45 MASHtable.tsv drwxrwxrwx. 1 root root 0 Oct 6 10:45 sketches

and It is also empty in the folder of sketches

Thank you

And if you want to ask me for same other information ,you can contact me with email so i can reply immediately

I will copy this message and send to your gmail

MrOlm commented 7 years ago

Hello,

OK, so it seems that mash isn't working properly. This could be an issue with dRep, or an issue with the program mash. Could you see if mash is working properly on it's own? For example, makes some sketches:

mash sketch maxbinout_W0P1.004.fasta MASH_files/maxbinout_W0P1.004.fasta.msh

Let me know if that command works and actually generates a file.

Thanks, -Matt

hzafeng commented 7 years ago

Hello, I can run Mash on it's own like this: mash sketch maxbinout_W0P1.004.fasta And it will generate a file named: maxbinout_W0P1.004.fasta.msh

But it failed when i type the order of : mash sketch maxbinout_W0P1.004.fasta MASH_files/maxbinout_W0P1.004.fasta.msh

The Mash version was V1.1.1 .

MrOlm commented 7 years ago

Hello,

Thats very strange that it's able to make the file normally, but not when it's to that specific folder... what is the error that it gives? Unfortunately, I've never encountered this issue before, but it sounds like an issue with permissions. May I ask what operating system you are using?

Using the "Skip_mash" parameter may be a way to get around this issue. Using that parameter makes the program take longer to run, but it will be just as accurate as if Mash was run, and in some cases even more accurate.

I'm worried that the underlying "permissions" issue may prevent other programs (like mummer, the program used is secondary clustering) from working as well. But if for some reason it's a mash-specific issue, using the "Skip_mash" parameter is a great idea.

Best, -Matt

hzafeng commented 7 years ago

Hallo,Matt

ohhh,now i can only run drep with skipmash.

I used CentOS system before and I am supposed to try it on Ubuntu

THanks for your greatest Help!

HU