GGFHF / TOA

TOA (Taxonomy-oriented Annotation) establishes workflows geared towards plant species that automate the extraction of information from genomic databases and the annotation of sequences.
GNU General Public License v3.0
4 stars 4 forks source link

Error: The basic data load in TOA database is wrong. #6

Open RNiloP opened 2 years ago

RNiloP commented 2 years ago

Dear Fernando

I had now a different issue whren trying to run a fasta file of mine. I am attaching an image of the error message, which states that there was a problem when loading the basic data in TOA database. I reload the basic data again and the issue persisted.

Many thanks

Ricardo

TOA_error_RNP

PD. I could not find the fasta file you used in the manual as an example.

fernandomoramarquez commented 2 years ago

Dear Ricardo:

Can you please tell me the process you are running? On the other hand, I attach a ZIP file with "test-100.fasta".

Regards,

test-100.zip

RNiloP commented 2 years ago

Dear Fernando

this is the step ("Run a pipeline"): Main menu > Annotation pipelines > TOA amino acid pipeline > Run pipeline [Execute] (page 24 from the pdf manual).

I also notice that the fasta file is from nucleotide sequences... I was using AA sequences. However, even with the test-100.fasta the same error was shown. I wonder if I run this step using a command line I can set the path to the file in a better way (if this is the issue).

Best

Ricardo

fernandomoramarquez commented 2 years ago

Dear Ricardo:

The aim of TOA is to perform the functional annotation in the bioinformatics study of NGS experiments using the files yielded in the assembly stage. Therefore, these files have nucleotide sequences corresponding to transcripts. In the "TOA amino acid pipeline" menu item, TOA extracts ORFs and predicts coding regions of the transcript sequences using TransDecoder. The alignment is performed with the predicted peptides.

To find out why the problem occurs, can you please upload a screenshot of the menu-item "Main menu > Configuration > View TOA config file"?

Regards,

RNiloP commented 2 years ago

Dear Fernando

many thanks for the thorouh explanation. I am attaching the requested file.

Best

Ricardo

TOA_conf_file_RANP.pdf

fernandomoramarquez commented 2 years ago

Dear Ricardo:

I attach a ZIP file with a version of "xtoa.py" which includes a new line (6526) that prints on the console the command used for checking that the basic data is correctly loaded in the TOA database. xtoa.zip

On the console you will see something like this screenshot: Screenshot from 2022-03-14 12-24-25

Can you please rename the old file to "xtoa.py.old" in TOA directory, copy the new file "xtoa.py", run the pipeline and upload a screenshot of the console?

Regards,

RNiloP commented 2 years ago

Dear Fernando

here is the information.

ranp_u@DESKTOP-OI2S8TK:~/TOA-master/Package$ ./TOA.py --mode=gui Starting TOA (Taxonomy-oriented Annotation) v0.66 ... Please, press [Ctrl] to continue ... command: export PATH=/home/ranp_u/Miniconda3/bin:/home/ranp_u/TOA-master/Package:$PATH; /home/ranp_u/Miniconda3/bin/python3 /home/ranp_u/TOA-master/Package/check-data-load.py --db=/home/ranp_u/TOA-databases/TOA/toa.db --group=basic

Best

Ricardo

fernandomoramarquez commented 2 years ago

Dear Ricardo:

Now, open a WSL2 Ubuntu console and run the command (export PATH=/home/ranp_u/Miniconda3/bin:/home/ranp_u/TOA-master/Package:$PATH; /home/ranp_u/Miniconda3/bin/python3 /home/ranp_u/TOA-master/Package/check-data-load.py --db=/home/ranp_u/TOA-databases/TOA/toa.db --group=basic). Does it work?

Regards,

RNiloP commented 2 years ago

Dear Fernando, I believe we are almost there. But there is an issue when it searched for the "library paramiko". I am afraid this might also happen with other libraries...

ranp_u@DESKTOP-OI2S8TK:$ cd TOA-master/ ranp_u@DESKTOP-OI2S8TK:/TOA-master/Package$ ./TOA.py ERROR: The library paramiko is not installed. Please, review how to install Paramiko in the manual. ranp_u@DESKTOP-OI2S8TK:$ pip install paramiko Collecting paramiko Downloading paramiko-2.10.2-py2.py3-none-any.whl (211 kB) |████████████████████████████████| 211 kB 11.4 MB/s Requirement already satisfied: six in ./Miniconda3/lib/python3.9/site-packages (from paramiko) (1.16.0) Collecting bcrypt>=3.1.3 Downloading bcrypt-3.2.0-cp36-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (61 kB) |████████████████████████████████| 61 kB 860 kB/s Requirement already satisfied: cryptography>=2.5 in ./Miniconda3/lib/python3.9/site-packages (from paramiko) (36.0.0) Collecting pynacl>=1.0.1 Downloading PyNaCl-1.5.0-cp36-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (856 kB) |████████████████████████████████| 856 kB 20.5 MB/s Requirement already satisfied: cffi>=1.1 in ./Miniconda3/lib/python3.9/site-packages (from bcrypt>=3.1.3->paramiko) (1.15.0) Requirement already satisfied: pycparser in ./Miniconda3/lib/python3.9/site-packages (from cffi>=1.1->bcrypt>=3.1.3->paramiko) (2.21) Installing collected packages: pynacl, bcrypt, paramiko Successfully installed bcrypt-3.2.0 paramiko-2.10.2 pynacl-1.5.0 ranp_u@DESKTOP-OI2S8TK:/TOA-master$ cd Package/ ranp_u@DESKTOP-OI2S8TK:~/TOA-master/Package$ ./TOA.py --mode=gui ERROR: The library paramiko is not installed. Please, review how to install Paramiko in the manual.

Best Ricardo

fernandomoramarquez commented 2 years ago

Dear Ricardo:

There may be a "mess" with the versions of Python installed. Can you please the following commands:

pip --version whereis pip python3 --version whereis python3 echo $PATH

Regards,

RNiloP commented 2 years ago

Dear Fernando

here are the answers:

ranp_u@DESKTOP-OI2S8TK:~$ pip --version pip 20.0.2 from /usr/lib/python3/dist-packages/pip (python 3.8)

ranp_u@DESKTOP-OI2S8TK:~$ whereis pip pip: /usr/bin/pip /mnt/c/Users/ranp1/AppData/Local/Programs/Python/Python39/Scripts/pip.exe /mnt/c/Users/ranp1/AppData/Local/Programs/Python/Python39/Scripts/pip3.exe /usr/share/man/man1/pip.1.gz

ranp_u@DESKTOP-OI2S8TK:~$ python3 --version Python 3.8.10

ranp_u@DESKTOP-OI2S8TK:~$ whereis python3 python3: /usr/bin/python3.8 /usr/bin/python3 /usr/bin/python3.8-config /usr/lib/python3.8 /usr/lib/python3.9 /usr/lib/python3 /etc/python3.8 /etc/python3 /usr/local/lib/python3.8 /usr/include/python3.8 /usr/share/python3 /mnt/c/Users/ranp1/AppData/Local/Programs/Python/Python39/python3.dll /mnt/c/Users/ranp1/AppData/Local/Programs/Python/Python39/python39.dll /mnt/c/Users/ranp1/AppData/Local/Microsoft/WindowsApps/python3.exe /usr/share/man/man1/python3.1.gz

ranp_u@DESKTOP-OI2S8TK:~$ echo $PATH /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/usr/lib/wsl/lib:/mnt/c/WINDOWS/system32:/mnt/c/WINDOWS:/mnt/c/WINDOWS/System32/Wbem:/mnt/c/WINDOWS/System32/WindowsPowerShell/v1.0/:/mnt/c/WINDOWS/System32/OpenSSH/:/mnt/c/Program Files/Docker/Docker/resources/bin:/mnt/c/ProgramData/DockerDesktop/version-bin:/mnt/c/Users/ranp1/AppData/Local/Programs/Python/Python39/Scripts/:/mnt/c/Users/ranp1/AppData/Local/Programs/Python/Python39/:/mnt/c/Users/ranp1/AppData/Local/Microsoft/WindowsApps:/mnt/c/Users/ranp1/AppData/Local/Programs/Microsoft VS Code/bin:/mnt/c/Users/ranp1/AppData/Local/Programs/MiKTeX/miktex/bin/x64/:/mnt/c/Users/ranp1/AppData/Roaming/TinyTeX/bin/win32:/mnt/c/Users/ranp1/AppData/Local/Box/Box Edit/:/snap/bin

Best

Ricardo

fernandomoramarquez commented 2 years ago

Dear Ricardo:

I did a "clean" installation of Ubuntu 20.04 using a Desktop image on a machine. I then installed TOA following the instructions in the manual and TOA worked OK and the amino acid pipeline is submitted without error.

The message "*** ERROR: The library paramiko is not installed." is printed when TOA.py checks if the Plotnine library is installed as shown below:

image

Obviously, the word "paramiko" in the text is wrong. The library that TOA cannot find is Plotnine. This check is always done when the application is started. I do not understand how you get this message now and in previous runs it has not come up.

Can you please install the Plotnine library and check if TOA starts?

Regards,

RNiloP commented 2 years ago

Dear Fernando

Thanks for the guidance. I installed plotnine using:

conda install -c conda-forge plotnine

and the previous error did not happened again. However, I could not run the amino acid pipeline. Please see the screenshot attached.

It seems that the issue might be related with the use of bash and dash.

https://unix.stackexchange.com/questions/605298/what-is-the-default-shell-in-the-windows-subsystem-for-linux-wsl

Anyways, I try to set a already partitioned 1Tb D disk of a Ubuntu running machine to run TOA... I wonder if you can help me with the instructions to erase the TOA info from the Windows machine.

Many thanks!

Ricardo

TOA_error2_RNP

fernandomoramarquez commented 2 years ago

Dear Ricardo:

As you told me, the command TOA runs to check the basic data in your environment is:

export PATH=/home/ranp_u/Miniconda3/bin:/home/ranp_u/TOA-master/Package:$PATH; /home/ranp_u/Miniconda3/bin/python3 /home/ranp_u/TOA-master/Package/check-data-load.py --db=/home/ranp_u/TOA-databases/TOA/toa.db --group=basic

But in the error message, some directories related to Docker are shown first. Let's do a test: edit the new xtoa.py file I attached 4 days ago, copy line 6483, comment it with "#", and remove ":$PATH" from the new line:

image

Now, can the acid amine pipeline be submitted without an error message?

On the other hand, to erase the TOA information you have to delete the directory where the TOA programs are located and the PATHs of Miniconda3, databases and results that appear in the menu-item "Main menu > Configuration > Recreate TOA config file".

Regards,

RNiloP commented 2 years ago

Dear Fernando

many thanks for the perseverance. The pipeline, following the previous instructions, could be launched. However, the process finished without success. I am attaching the log.txt file. It seems that the error is due to a problem finding the nr file to perform the Blast analysis:

######################################### ALIGNMENT OF PEPTIDES TO NR PROTEOME Aligning peptides ... BLAST Database error: Could not find volume or alias file (nr.01) referenced in alias file (/home/ranp_u/TOA-databases/NCBI/nr-blastplus-db/nr). Command exited with non-zero status 2 0.02user 0.00system 0:00.02elapsed 89%CPU (0avgtext+0avgdata 34348maxresident)k 256inputs+0outputs (3major+1531minor)pagefaults 0swaps ######################################### ERROR: blastp returned error 2 Script ended WRONG at 2022-03-18 14:00:48 with a run duration of 95 s (000:01:35). #########################################

Maybe also a problem with the path?

Best

Ricardo

log.txt

fernandomoramarquez commented 2 years ago

Dear Ricardo:

First, can you please check if the process "Build BLAST database NR for BLAST+" ended OK reviewing its log? If it was OK, we will interactively relaunch the pipeline. From the directory "/home/ranp_u/TOA-results/pipeline/toapipelineaa-toapipelineaa-220318-135913" run the command:

./toapipeline-process-starter.sh

Did it end OK?

Regards,

RNiloP commented 2 years ago

Dear Fernando

I can confirm that the "Build BLAST database NR for BLAST+" process ended OK. Next, I run the following commands (I could not find and run ./toapipeline-process-starter.sh):

./toapipelineaa-process-starter.sh (no results) ./toapipelineaa-process.sh (results attached).

It ended similarly to the previous time:

######################################### ERROR: blastp returned error 2 Script ended WRONG at 2022-03-19 11:51:31 with a run duration of 92 s (000:01:32). ######################################### TOA_error3_RNP

Many thanks

Ricardo

fernandomoramarquez commented 2 years ago

Dear Ricardo:

According to the NCBI website (https://www.ncbi.nlm.nih.gov/books/NBK279684/) , the exit code 2 is due to an "Error in BLAST database". As the process "Build BLAST database NR for BLAST+" ended OK, I think this exit code could be due to the value of the BLASTDB environment variable. Let's check it. Insert this statement in the file "toapipelineaa-process.sh":

echo "BLASTDB: $BLASTDB"

as shown below:

image

Then, relaunch the pipeline using "toapipelineaa-process-starter.sh" and review the log to verify the value of BLASTDB. It should be similar to the marked line in this figure:

image

The directory must contain the NR database files (462 files "nr" and 2 files "taxdb" on my computer).

Regards,

RNiloP commented 2 years ago

Dear Fernando

it seems, as you suspected, that the database is not being correctly assembled, since it has only 14 "nr" files and 2 "taxdb" files.

ranp_u@DESKTOP-OI2S8TK:~/TOA-databases/NCBI/nr-blastplus-db$ ls nr.00.phd nr.00.phi nr.00.phr nr.00.pin nr.00.pog nr.00.ppd nr.00.ppi nr.00.psq nr.pal nr.pdb nr.pos nr.pot nr.ptf nr.pto taxdb.btd taxdb.bti

I will try to clean it up and run again the command for generating this database, what do you think?

Best

Ricardo

fernandomoramarquez commented 2 years ago

Dear Ricardo:

To build the NR database you need time (my last run of this process took about 3 hours 14 minutes) and a good internet connection. I usually run this process when America people are sleeping to prevent Linux wget command problems downloading such large files from NCBI servers. Simply run the process again from the corresponding menu-item.

Regards,

RNiloP commented 2 years ago

Dear Fernando

I ran the process again and it has been 48 hours. I will wait a couple of hours more and check the files that have been downloaded. Is there another way to load then in case I do not have the 462 files "nr" and 2 files "taxdb"?

Best

Ricardo

fernandomoramarquez commented 2 years ago

Dear Ricardo:

In the directory ".../TOA-results/database" are the subdirectories "toabbnrbp-YYMMDD-HHMMSS" which contain the information of the runs of processes for the construction of the NR database (with BLAST+). Edit the script "toabbnrbp-process.sh" from the last run.

image

As you can see in this figure, the "nr.*.tar.gz" files are downloaded from the NCBI FPT server in the local directory ".../TOA-databases/NCBI/ftp.ncbi.nlm.nih.gov/blast/db". Then, they are decompressed in the directory ".../TOA-databases/NCBI/nr-blastplus-db" and deleted.

If you have problems with the download of the files "nr.*.tar.gz" (there are 57 files), you can manually download them from "https://ftp.ncbi.nlm.nih.gov/blast/db/", comment lines corresponding to the download (244-256) in the script "toabbnrbp-process.sh" and launch it using the script "toabbnrbp-process-starter.sh".

Regards,

RNiloP commented 2 years ago

Dear Fernando

I was able to download all the 57 files in the local directory ".../TOA-databases/NCBI/ftp.ncbi.nlm.nih.gov/blast/db" (see attached file "folders_NCBI.txt"). However, after commenting the script "toabbnrbp-process.sh", as indicated before, and launch it using the script "toabbnrbp-process-starter.sh", no files were decompressed in the directory ".../TOA-databases/NCBI/nr-blastplus-db". I am attaching the modified "toabbnrbp-process.sh" (as .txt to be able to attach it in here) file and the log.txt.

Best

Ricardo

folders_NCBI.txt log.txt toabbnrbp-process.sh.txt

fernandomoramarquez commented 2 years ago

The problem is that the SO commands are not found (mkdir, rm, data, ...). This is due to the fact that we removed $PATH in the line 6483 of file "xtoa.py" ten days ago because Windows 11 WLS2 does not properly resolve this environment variable. In the script "toabbnrbp-process.sh", add:

/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

to the variable "PATH":

image

Now run the script "toabbnrbp-process-starter.sh". Did the script end OK?

Regards,

RNiloP commented 2 years ago

Dear Fernando

it worked! I am attaching the content of the results folder. I will now check these results more carefully and then run my data.

Many thanks for the patience and time expent in this endeavour.

Muchas muchas gracias!

Ricardo

results_test100_20220329.txt

fernandomoramarquez commented 2 years ago

Dear Ricardo:

Great! Then modify the line 6484 of the file "xtoa.py" to include OS paths:

image

That way, you will be able to re-build the NR database without any problems when necessary.

Muchas gracias por utilizar TOA. Esperamos que te sea de utilidad en tus análisis bioinformáticos.

RNiloP commented 2 years ago

Dear Fernando

the program has been running for a while, mainly in the step when the nr database was assessed (> 3 days at this step). Most of the files generated in the example with the test100.fasta where generated. However, the stats could not be generated. I am attaching the log file, but the error is the following::

"CALCULATE ANNOTATION STATISTICS Calculating stats ... Traceback (most recent call last): File "/home/ranp_u/TOA-master/Package/calculate-annotation-stats.py", line 948, in main(sys.argv[1:]) File "/home/ranp_u/TOA-master/Package/calculate-annotation-stats.py", line 58, in main calculate_functional_stats(conn, args.annotation_file, args.type, args.stats_file) File "/home/ranp_u/TOA-master/Package/calculate-annotation-stats.py", line 389, in calculate_functional_stats interpro_desc_dict[interpro_id_list[i]] = interpro_desc_list[i] IndexError: list index out of range Command exited with non-zero status 1 38.96user 0.20system 0:40.74elapsed 96%CPU (0avgtext+0avgdata 50856maxresident)k 151560inputs+16outputs (0major+10683minor)pagefaults 0swaps ######################################### ERROR: calculate-annotation-stats.py returned error 1 Script ended WRONG at 2022-04-04 19:13:02 with a run duration of 532855 s (148:00:55). #########################################"

I also would like to ask which would be the best file from where to extract the predicted EC numbers for the enzymes in my organism genome... is it the plant-annotation.csv file?

Best

Ricardo

PD. I will install TOA in the linux PC anyway, I believe the configuration from the WSL2 is not the optimal for running TOA.

log.txt

fernandomoramarquez commented 2 years ago

Dear Ricardo:

In the attached ZIP file there is a version of the program "calculate-annotation-stats.py" that prints additional information when the exception occurs (lines 389-395). Can you please rename the old file to "calculate-annotation-stats.py.old" in TOA directory and copy the new file "calculate-annotation-stats.py". Then restart the pipeline in the menu-item "Menu item > Annotation pipelines > TOA amino acid pipeline > Restart pipeline" selecting the identification of the pipeline ended wrong. What information is printed in the log before the exception?

calculate-annotation-stats.zip

Regards,