Open sihellem opened 4 years ago
Hello Simon, sorry for the late response.
concerning the second part of the error
Error in file(file, "rt") : impossible d'ouvrir la connexion Calls: read.table -> file De plus : Warning message: In file(file, "rt") : impossible d'ouvrir le fichier '../output/superkingdomOutput.txt' : Aucun fichier ou dossier de ce type Exécution arrêtée [E::stk_subseq] failed to read the list of regions in file 'OutputClustering.txt' Error in data.frame(Contig = rRNA16sTaxonomy2[, 2], TaxonDensity = 1, : les arguments impliquent des nombres de lignes différents : 0, 1 Exécution arrêtée Error in file(file, "rt") : impossible d'ouvrir la connexion Calls: read.table -> file De plus : Warning message: In file(file, "rt") : impossible d'ouvrir le fichier '../output/superkingdomOutput.txt' : Aucun fichier ou dossier de ce type Exécution arrêtée [E::stk_subseq] failed to read the list of regions in file 'OutputClustering.txt'
there was an error in the SeqDex.sh file. I have fixed it, so you can just download this file, substitute it in your SeqDex folder and use it.
Concerning the error
Error in data.frame(Contig = rRNA16sTaxonomy2[, 2], TaxonDensity = 1, : les arguments impliquent des nombres de lignes différents : 0, 1
do you have the rRNA16sTaxonomy2.txt file in Taxonomy folder?
Also, in your SeqDex.sh file you have
#file name of the blast database in $RDP RDPI=current_Bacteria_unaligned.fa
but in this filed you have to put the base name of blast database build on the unaligned RDP 16S database. Is the name of the file correct?
Hello,
Thank you for your response. There was indeed an error for the base name of the database for 16S.
I ran it again and experienced another error (see below). It seems SeqDex does not find the taxonomy sql file from taxonomizr, which was built as indicated in the documentation.
Is there a way around the problem?
Best, Simon
Error in if (!file.exists(sqlFile)) stop(sqlFile, " does not exist.") :
l'argument est de longueur nulle
Exécution arrêtée
Error in read.table(opt$taxaRDP, header = FALSE, stringsAsFactors = FALSE, :
pas de lignes disponibles dans l'entrée
Exécution arrêtée
mkdir: impossible de créer le répertoire « SVMoutput »: Le fichier existe
Error in file(file, "rt") : impossible d'ouvrir la connexion
Calls: read.table -> file
De plus : Warning message:
In file(file, "rt") :
impossible d'ouvrir le fichier '../Taxonomy/superkingdomTaxonomyIteration.txt' : Aucun fichier ou dossier de ce type
Exécution arrêtée
mkdir: impossible de créer le répertoire « ClusteringOutputSVM »: Le fichier existe
Error in file(file, "rt") : impossible d'ouvrir la connexion
Calls: read.table -> file
De plus : Warning message:
In file(file, "rt") :
impossible d'ouvrir le fichier '../SVMoutput/superkingdomOutputSVM.txt' : Aucun fichier ou dossier de ce type
Exécution arrêtée
[E::stk_subseq] failed to read the list of regions in file 'OutputClustering.txt'
mkdir: impossible de créer le répertoire « RFoutput »: Le fichier existe
Error in file(file, "rt") : impossible d'ouvrir la connexion
Calls: read.table -> file
De plus : Warning message:
In file(file, "rt") :
impossible d'ouvrir le fichier '../Taxonomy/superkingdomTaxonomyIteration.txt' : Aucun fichier ou dossier de ce type
Exécution arrêtée
mkdir: impossible de créer le répertoire « ClusteringOutputRF »: Le fichier existe
Error in file(file, "rt") : impossible d'ouvrir la connexion
Calls: read.table -> file
De plus : Warning message:
In file(file, "rt") :
impossible d'ouvrir le fichier '../RFoutput/superkingdomOutputRF.txt' : Aucun fichier ou dossier de ce type
Exécution arrêtée
[E::stk_subseq] failed to read the list of regions in file 'OutputClustering.txt'
Hi Simon, I see some little errors.
First, SeqDex searches for the Taxonomizer file (called exactly "accessionTaxa.sql") locally. If you have moved the file to an external HD, SeqDex will not find it. I would optimise it in the near future, but by now you can or move back the sql file to the PC main HD or change manually the path where SeqDex search for the sql file. In Func.R file, you can edit the lines 71-72
path <- list.files(path= "~", full.names=TRUE, recursive=TRUE,pattern="(accessionTaxa.sql)")
and change the "~" with the path to the external HD. It should be something like this
path <- list.files(path= "/Volumes/external_HD_name", full.names=TRUE, recursive=TRUE,pattern="(accessionTaxa.sql)")
Second, SeqDex does not find a file needed for the 16S taxonomy. It's a file called "RDP16s_taxa_mod.txt". I really cannot understand why, as SeqDex should rerun the 16S part if this file is not in the right location, until you have not commented out some part of the SeqDex.sh file and then removed the "RDP16s_taxa_mod.txt" file. Have you the "RDP16s_taxa_mod.txt" in the Taxonomy folder?
The other errors are error messages due to the fact that the folders that SeqDex is trying to create exist already, or impossibility to complete tasks due to the absence of the taxonomy files.
Best, Alice
Dear Alice,
Thank you for your reply. I had indeed to move "accessionTaxa.sql" to the cluster /work/ partition so I will specify the path in Func.R.
I confirm I did not touch anything from the Taxonomy folder. However, as you can see below, the mentioned file, as well as others, are there but empty.
8.7K Feb 24 01:06 16sContigs.fasta
134 Feb 24 01:06 16scontigsName.txt
0 Feb 24 01:06 16sContigvsRDP.txt
2.3K Feb 24 01:06 barrnap16s_contigs.gff
30G Feb 24 01:04 ContigsvsNt.txt
0 Feb 24 01:06 RDP16s_taxa_mod.txt
0 Feb 24 01:06 RDP16s_taxa.txt
0 Feb 24 01:06 RDP16s.txt
Any idea why?
Best, Simon
Dear Simon,
I suppose that there is again a problem with the variable used for the 16S part. In the SeqDex.sh file you have
#path to blast database built using $RDPF downloaded from https://rdp.cme.msu.edu/misc/resources.jsp
#or any other custom database with sequence titles fulfilling RDP sequence titles formatting rules
RDP=~/database/rdp16s
#fasta file used to build $RDPI
RDPF=current_Bacteria_unaligned.fa
#file name of the blast database in $RDP
RDPI=rdp16S
Therefore,
Have you completed these fields correctly?
You mention you have other empty files; which one?
Alice
Dear Alice,
Here is the log from the run after modifying the path to sql database:
[bam_sort_core] merging from 10 files and 10 in-memory blocks...
[bam_sort_core] merging from 10 files and 10 in-memory blocks...
BLAST Database error: No alias or index file found for nucleotide database [/work/TEAM/SimonH/databases/RDP_Bact_db/current_Bacteria_unaligned] in search path [/work/TEAM/SimonH/Neotropical/seqdex/seqdex_bwa/Taxonomy::]
Error in read.table(opt$taxaRDP, header = FALSE, stringsAsFactors = FALSE, :
pas de lignes disponibles dans l'entrée
Exécution arrêtée
Error in file(file, "rt") : impossible d'ouvrir la connexion
Calls: read.table -> file
De plus : Warning message:
In file(file, "rt") :
impossible d'ouvrir le fichier '../Taxonomy/rRNA16sTaxonomy2.txt' : Aucun fichier ou dossier de ce type
Exécution arrêtée
Error in file(file, "rt") : impossible d'ouvrir la connexion
Calls: read.table -> file
De plus : Warning message:
In file(file, "rt") :
impossible d'ouvrir le fichier '../SVMoutput/superkingdomOutputSVM.txt' : Aucun fichier ou dossier de ce type
Exécution arrêtée
[E::stk_subseq] failed to read the list of regions in file 'OutputClustering.txt'
Error in file(file, "rt") : impossible d'ouvrir la connexion
Calls: read.table -> file
De plus : Warning message:
In file(file, "rt") :
impossible d'ouvrir le fichier '../Taxonomy/rRNA16sTaxonomy2.txt' : Aucun fichier ou dossier de ce type
Exécution arrêtée
Error in file(file, "rt") : impossible d'ouvrir la connexion
Calls: read.table -> file
De plus : Warning message:
In file(file, "rt") :
impossible d'ouvrir le fichier '../RFoutput/superkingdomOutputRF.txt' : Aucun fichier ou dossier de ce type
Exécution arrêtée
[E::stk_subseq] failed to read the list of regions in file 'OutputClustering.txt'
I confirm there is no error in the SeqDex file:
#path to blast database built using $RDPF downloaded from https://rdp.cme.msu.edu/misc/resources.jsp
#or any other custom database with sequence titles fulfilling RDP sequence titles formatting rules
RDP=/work/TEAM/SimonH/databases/RDP_Bact_db
#fasta file used to build $RDPI
RDPF=current_Bacteria_unaligned.fa
#file name of the blast database in $RDP
RDPI=current_Bacteria_unaligned
With database constructed where specified:
[simon-hellemans@sango-login2 RDP_Bact_db]$ pwd
/work/TEAM/SimonH/databases/RDP_Bact_db
[simon-hellemans@sango-login2 RDP_Bact_db]$ ls
current_Bacteria_unaligned.fa current_Bacteria_unaligned.fa.nin current_Bacteria_unaligned.fa.nsd current_Bacteria_unaligned.fa.nsq
current_Bacteria_unaligned.fa.nhr current_Bacteria_unaligned.fa.nog current_Bacteria_unaligned.fa.nsi current_Bacteria_unaligned.gb
In Taxonomy/, empty files are the following:
0 Feb 24 01:06 16sContigvsRDP.txt
0 Feb 24 01:06 RDP16s_taxa_mod.txt
0 Feb 24 01:06 RDP16s_taxa.txt
0 Feb 24 01:06 RDP16s.txt
Sorry if I am missing something here..
Best, Simon
Dear Simon, the error continue to say that it does not find the blast database.
I suppose it is because you wrote in the SeqDex.sh file
RDPI=current_Bacteria_unaligned
but should be
RDPI=current_Bacteria_unaligned.fa
according to the ls
output.
Blast here is searching for files named current_Bacteria_unaligned.nhr, current_Bacteria_unaligned.nin, current_Bacteria_unaligned.nog, and so on, but it cannot find them as your database files are named current_Bacteria_unaligned.fa.nhr, current_Bacteria_unaligned.fa.nin, current_Bacteria_unaligned.fa.nog, etc.
Try check this point and let me know.
Best, Alice
Dear Alice,
Sorry for the delay of my answer. So it seems to get better, but it is still not it. I corrected RDPI to the state I initially set it, and as you also suggested in your last message.
Now, Taxonomy/ files are not empty, good!
8894 2 avr 09:24 16sContigs.fasta
134 2 avr 09:24 16scontigsName.txt
107214 2 avr 09:24 16sContigvsRDP.txt
2355 2 avr 09:24 barrnap16s_contigs.gff
32149827807 2 avr 09:23 ContigsvsNt.txt
193149 2 avr 09:25 RDP16s_taxa_mod.txt
194164 2 avr 09:25 RDP16s_taxa.txt
11407 2 avr 09:24 RDP16s.txt
158 2 avr 10:55 rRNA16sTaxonomy2.txt
146122 2 avr 10:54 superkingdomTaxonomyIteration.txt
I just verified and both folders SVMoutput/ and RFoutput/ are completely empty, and it follows that fasta files written in ClusteringOutputSVM/ and ClusteringOutputRF/ are empty as well.
Here are the remaining errors:
[bam_sort_core] merging from 10 files and 10 in-memory blocks...
[bam_sort_core] merging from 10 files and 10 in-memory blocks...
Error in data.frame(Contig = rRNA16sTaxonomy2[, 2], TaxonDensity = 1, :
les arguments impliquent des nombres de lignes différents : 0, 1
Exécution arrêtée
Error in file(file, "rt") : impossible d'ouvrir la connexion
Calls: read.table -> file
De plus : Warning message:
In file(file, "rt") :
impossible d'ouvrir le fichier '../SVMoutput/superkingdomOutputSVM.txt' : Aucun fichier ou dossier de ce type
Exécution arrêtée
[E::stk_subseq] failed to read the list of regions in file 'OutputClustering.txt'
Error in data.frame(Contig = rRNA16sTaxonomy2[, 2], TaxonDensity = 1, :
les arguments impliquent des nombres de lignes différents : 0, 1
Exécution arrêtée
Error in file(file, "rt") : impossible d'ouvrir la connexion
Calls: read.table -> file
De plus : Warning message:
In file(file, "rt") :
impossible d'ouvrir le fichier '../RFoutput/superkingdomOutputRF.txt' : Aucun fichier ou dossier de ce type
Exécution arrêtée
[E::stk_subseq] failed to read the list of regions in file 'OutputClustering.txt'
It seems to point to a problem in the file rRNA16sTaxonomy2.txt which is in Taxonomy/ folder. I just verified and this file is actually empty (only contains the file header). In this folder, it is also the case of the file 16scontigsName.txt.
However, all other files from this folder contain analyses results. Would it actually be that submitted data to SeqDex actually cannot pass it further somehow?
Best, Simon
Hi Simon,
if the 16scontigsName.txt is empty, it is possible that there is some issue with the prediction of the 16S genes done by barrnap. I bet that the file barrnap16s_contigs.gff is not empty, but can you please post me a few lines? Just to be able to check the structure of the output.
Indeed, without the 16scontigsName.txt file, SeqDex is unable to draw the table with the 16S contigs and their taxonomy, and thus cannot use it to find the16S gene with higher coverage among the one with the ones with the taxonomy indicated in $TRG. SeqDex in this case perform the prediction, but cannot return the tables with the contigs of interest.
Best, Alice
Dear Alice,
Thank you for your answer. Indeed, the Taxonomy/barrnap16s_contigs.gff file is not empty. Here is its full content:
##gff-version 3
NODE_104885_length_332_cov_0.530806 barrnap:0.9 rRNA 164 267 2.3e-07 - . Name=5S_rRNA;product=5S ribosomal RNA
NODE_133557_length_316_cov_0.117949 barrnap:0.9 rRNA 151 237 4.1e-08 + . Name=5S_rRNA;product=5S ribosomal RNA (partial);note=aligned only 73 percent of the 5S ribosomal RNA
NODE_13431_length_458_cov_0.569733 barrnap:0.9 rRNA 124 235 1.2e-20 - . Name=5S_rRNA;product=5S ribosomal RNA
NODE_163559_length_302_cov_0.331492 barrnap:0.9 rRNA 102 189 2.9e-10 + . Name=5S_rRNA;product=5S ribosomal RNA (partial);note=aligned only 73 percent of the 5S ribosomal RNA
NODE_168098_length_300_cov_0.335196 barrnap:0.9 rRNA 86 197 6.2e-21 - . Name=5S_rRNA;product=5S ribosomal RNA
NODE_18671_length_437_cov_0.629747 barrnap:0.9 rRNA 279 351 6.8e-07 - . Name=5S_rRNA;product=5S ribosomal RNA (partial);note=aligned only 61 percent of the 5S ribosomal RNA
NODE_1_length_3834_cov_14.843523 barrnap:0.9 rRNA 1642 2577 1.5e-76 - . Name=16S_rRNA;product=16S ribosomal RNA (partial);note=aligned only 59 percent of the 16S ribosomal RNA
NODE_1_length_3834_cov_14.843523 barrnap:0.9 rRNA 2903 3550 5.3e-09 - . Name=16S_rRNA;product=16S ribosomal RNA (partial);note=aligned only 40 percent of the 16S ribosomal RNA
NODE_215818_length_283_cov_0.481481 barrnap:0.9 rRNA 78 172 3e-13 + . Name=5S_rRNA;product=5S ribosomal RNA
NODE_220014_length_282_cov_0.372671 barrnap:0.9 rRNA 120 217 2.2e-09 + . Name=5S_rRNA;product=5S ribosomal RNA
NODE_230787_length_279_cov_0.506329 barrnap:0.9 rRNA 45 139 2.9e-08 - . Name=5S_rRNA;product=5S ribosomal RNA
NODE_2961_length_577_cov_0.587719 barrnap:0.9 rRNA 94 522 2.8e-108 - . Name=16S_rRNA;product=16S ribosomal RNA (partial);note=aligned only 27 percent of the 16S ribosomal RNA
NODE_2_length_3745_cov_15.226269 barrnap:0.9 rRNA 1517 3568 2.9e-99 - . Name=23S_rRNA;product=23S ribosomal RNA (partial);note=aligned only 63 percent of the 23S ribosomal RNA
NODE_32715_length_404_cov_0.413428 barrnap:0.9 rRNA 130 236 5.9e-08 - . Name=5S_rRNA;product=5S ribosomal RNA
NODE_53419_length_376_cov_0.501961 barrnap:0.9 rRNA 220 310 8e-09 + . Name=5S_rRNA;product=5S ribosomal RNA (partial);note=aligned only 76 percent of the 5S ribosomal RNA
NODE_6624_length_507_cov_1.225389 barrnap:0.9 rRNA 1 457 1.3e-101 - . Name=16S_rRNA;product=16S ribosomal RNA (partial);note=aligned only 28 percent of the 16S ribosomal RNA
Thank you for your time and help, Simon
Hi Simon, sorry for the very late response.
The barranp gff file seems ok to me. I suppose that it may occur some issue with the rRNA16S.R script but without your 16S related file I am not able to control and correct it. If you want, you can sent me the files at alice.chiodi@unimi.it
Best, Alice
Hello,
I tried to run SeqDex on a cluster by inputting contigs.fasta produced by SPAdes and the sam alignment from Bowtie2 and I got the following error:
Here is the beginning of the SeqDex.sh file, if it could help:
Do you have any idea what went wrong?
Thanks in advance for your response, Cheers, Simon