GiantSpaceRobot / FindFungi

A pipeline for the identification of fungi in public metagenomics datasets
16 stars 15 forks source link

sqlite3.OperationalError: disk I/O error #1

Closed yangkl0909 closed 4 years ago

yangkl0909 commented 6 years ago

Hi Paul, I am running FindFungi for the demo ERR675624 fastq on cluster(CentOS Linux release 7.4.1708). I had remove bsub and got the following results. There something error I can't fix, could help have a check? Thank you.

The CSV is always empty with the errors showing below

Best Keli

GiantSpaceRobot commented 6 years ago

Hi Keli,

I haven't seen this error before with FindFungi. For ete3-3.1.1, it looks like ete3 cannot write to file. Is your directory underneath Dropbox/Google Drive? If this is the case, try pausing Dropbox/Google Drive and run the LowestCommonAncestor_V4.sh programme, or run the pipeline again. This problem can occur because the syncing of information causes read/write inconsistencies.

If this is not the problem, please make sure you are using the newest release of ete3 (https://github.com/etetoolkit/ete/tree/master/ete3) as they have have fixed taxdump database bugs recently.

Lastly, have you been using conda to manage Python versions/installations etc? I have seen this cause dependency clashes before.

Best, Paul

yangkl0909 commented 6 years ago

@GiantSpaceRobot
Thank you.

I fixed the qlite3.OperationalError: disk I/O error by moving the output back to my PC. We don't run Google Drive or any other sync tools in our cluster. It seems the error might caused by Lustre file system in our cluster showing like this https://github.com/TApplencourt/EMSL_Basis_Set_Exchange_Local/issues/6.

However, when moving on, another error occur in the LowestCommonAncestor_V4.py.

Traceback (most recent call last):
  File "/u2host/findfungi_test/FindFungi-v0.23.3/LowestCommonAncestor_V4.py", line 56, in <module>
    for i in descendants:
NameError: name 'descendants' is not defined
Done

I got the Final_Results_ERR675624-lca.csv with one line title while no other information.

Taxon name,Taxid,Reads mapping to taxid,Reads mapping to children taxids,Pearson skewness score,Percent of pseudo-chromosomes with read hits

Seems there is an indentation error in this python script?


C ERR675624.1001391_FC61KB9AAXX:4:14:2367:14327#GATCAG/1 76775 100 100 0.042346 100
C ERR675624.1001391_FC61KB9AAXX:4:14:2367:14327#GATCAG/2 76775 100 100 0.042346 100
C ERR675624.101282_FC61KB9AAXX:4:2:6436:3574#GATCAG/1 1759314 100 100 0.471951 100
C ERR675624.101282_FC61KB9AAXX:4:2:6436:3574#GATCAG/2 294747 100 100 -0.42256 100
C ERR675624.1029318_FC61KB9AAXX:4:14:9179:15769#GATCAG/1 1759314 100 100 0.471951 100

Is the Final_Results_ERR675624.tsv output right?

GiantSpaceRobot commented 6 years ago

Hi @yangkl0909

That output looks correct. And you are correct that there was an indentation error in the LowestCommonAncestor_V4.py script. I have fixed this now (https://github.com/GiantSpaceRobot/FindFungi/blob/master/FindFungi-v0.23.3/LowestCommonAncestor_V4.py).

Let me know if you have any trouble. Thank you.

Paul

yangkl0909 commented 6 years ago

@GiantSpaceRobot Hi Paul.

Sorry, the same error occurs in the script LowestCommonAncestor_V4.py.

Thank you.

Best Keli

GiantSpaceRobot commented 6 years ago

Hi Keli,

I have re-formatted the LowestCommonAncestor_V4.py script. It is working for on a Unix server. Let me know how it goes.

Thanks, Paul

yangkl0909 commented 6 years ago

Hi Paul,

Finally, I got the csv tables with your new script.

It was done in a new ubuntu 14.04 environment with python 2.7.13(without Conda) run in docker. While still running error in ubuntu 16.04 with python 2.7.15 (install by Conda). Maybe the script can't run in python 2.7.15 or a conda environment?

Thanks a lot.

Best Keli

GiantSpaceRobot commented 6 years ago

Glad to hear it, Paul

mhyleung commented 5 years ago

Dear all

I am running this on python 2.7.13 (without Conda), but even with the new LowestCommonAncestor_V4.py script I am encountering the same error:

  File "/disk/rdisk08/mhyleung/tools/FindFungi/FindFungi-v0.23.3/LowestCommonAncestor_V4.py", line 56, in <module>
    for i in descendants:
NameError: name 'descendants' is not defined
Done

My output also has only the top line and no other information.

Thanks

Marc

GiantSpaceRobot commented 5 years ago

Hi Marc,

This script is working on my system, I am unsure why this error is happening. Are you certain ete3 is up-to-date? If it is, can you try running the script with a small test set? (data attached)

Execute the script:

python LowestCommonAncestor_V4.py KrakenOutput.txt Taxids.txt MyOutput.tsv

Let me know if this works/fails.

Best, Paul

Taxids.txt KrakenOutput.txt

mhyleung commented 5 years ago

Hi Paul

The same descendants error persists. From my understanding on this thread there is no way that the script would work on a conda version of python right? It seems like I am encountering some trouble trying to get ete3 (v3.1.1) using a non-conda version of python 2.7.13, and ete3 is currently only working under conda for me. I will keep on trying to fix my python 2.7.13 issue, but in the meantime, I just want to know if there is a way the script would work even if I run python on conda.

Thanks again

Marc

GiantSpaceRobot commented 5 years ago

Hi Marc,

Unfortunately I have not tested the scripts/pipeline in Conda at all so I can't offer any advice. FYI, I am using Python 2.7.15rc1 and it works fine with ete3 v3.1.1.

Paul