PierreBSC / Viral-Track

MIT License
54 stars 27 forks source link

error in Viral_Track_scanning.R #23

Open Shiywa opened 1 year ago

Shiywa commented 1 year ago

Hi, thanks for your work, I just hava a question about the Viral_Track_scanning.R. If there is no any mapping of virus, will it report an error like following?

Loading of the libraries.... ... done ! Warning message:
Output directory does not exist ! Creating it ! 
1 Fastq files are going to be processed ! 
Mapping step2_hgmm_100_R2_extracted.fastq file 
Aug 03 17:58:15 ..... started STAR run
Aug 03 17:58:15 ..... loading genome
Aug 03 17:58:26 ..... started 1st pass mapping
Aug 03 17:58:46 ..... finished 1st pass mapping
Aug 03 17:58:46 ..... inserting junctions into the genome indices
Aug 03 17:59:53 ..... started mapping
Aug 03 18:00:13 ..... finished mapping
Aug 03 18:00:16 ..... started sorting BAM
Aug 03 18:00:30 ..... finished successfully
Mapping ofstep2_hgmm_100_R2_extracted.fastq done ! 
All fastq files have been mapped successfully 
Starting the BAM file analysis 
Indexing of the bam file for step2_hgmm_100_R2_extracted is done 
Computing stat file for the bam file for step2_hgmm_100_R2_extracted is done 
Checking the mapping quality of each virus... 
Export of the viral SAM file done for step2_hgmm_100_R2_extracted 
Error in { : task 1 failed - "the condition has length > 1"
Calls: %dopar% -> <Anonymous>
GhobrialMoheb commented 1 year ago

Hi I can't run "Viral_Track_scanning.R", did you run it in windows or linux R?

I get this error message, despite having Biostrings installed Loading of the libraries.... Error in library(Biostrings) : there is no package called ‘Biostrings’ Calls: suppressMessages -> withCallingHandlers -> library Execution halted

Shiywa commented 1 year ago

I don't have any error about the packages.

GhobrialMoheb commented 1 year ago

Thanks for your reply, may you please guide me as to how you made it work to this step?

Shiywa commented 1 year ago

Thanks for your reply, may you please guide me as to how you made it work to this step?

sorry, I haven't finished a successful test now. I used it firstly today.

GhobrialMoheb commented 1 year ago

This is my status now:

Export of the viral SAM file done for hgmm_100_R2_extracted Error in { : task 1 failed - "different row counts implied by arguments" Calls: %dopar% -> In addition: Warning messages: 1: In dir.create(paste(k, "Viral_BAM_files", sep = "")) : '/mnt/c/Users/mohebg/Desktop//Viral_Track/Test_COVID//hgmm_100_R2_extracted/Viral_BAM_files' already exists 2: executing %dopar% sequentially: no parallel backend registered Execution halted

@Shiywa, same as you. Did you find a solution?

zorglubz-coder commented 1 year ago

Hi, I do have the exact same issue as @Shiywa . After some understanding of the code, it seems that the if statment line 343 of Viral_Track_scanning.R is at fault.

if (class(Viral_reads_contents)=="numeric") { Viral_reads_contents = matrix(Viral_reads_contents_mean,ncol = 4) }

The class of Viral_reads_contents is "matrix" "array" which leads to the issue observed since if only wants one class. This code seems useless since the variable "Viral_reads_contents_mean" is not defined at this step.

I would suggest that you just delete this part, and try to run again. (I am currently running it, I hope this solved the issue)

From my understanding of the code, you can already see if you have virus mapped at this step if you look at the file output_folder/fasta_file_name_extracted/Count_chromosomes.txt

Please let me know if this helps.

zorglubz-coder commented 1 year ago

Hi @GhobrialMoheb,

From my experience you should delete everything from your output folder starting from scratch every time. (remove unsuccessful attempt to run the viral track).

And I am running everything on Ubuntu. (you can easily install Ubuntu on windows 10), but I have no idea why you have issue.

mohebg commented 1 year ago

@zorglubz-coder , thanks for the comment.

The issue I have realized is that "BAM_file@elementMetadata$seq" at step 338 yields NULL.

Yes, I do clear the output folder. I also use Ubuntu in Windows.

May I ask you, which FASTQ files you are doing the trial on? Did your trial work?

zorglubz-coder commented 1 year ago

I did not do a trial, and I never finished one analysis of my data, so I am by no mean an expert.

Are you sure you are in the foreach loop? From what I understand the foreach loop create its own environment, so you cannot access it once it crashed. This would explain the NULL you have

I can only suggest that you try to study everything in R studio in windows (at least that is how I did). You just have to install and load the packages. Export one bam file present in the output_folder/fasta_name_extracted/Viral_BAM_files/viral_name.bam, and try to run

BAM_file= readGAlignments(paste("path_to_bam"),param = ScanBamParam(what =scanBamWhat()))
Viral_reads = unique(BAM_file@elementMetadata$seq)
Viral_reads_contents = alphabetFrequency(Viral_reads,as.prob =T )
Viral_reads_contents = Viral_reads_contents[,c("A","C","G","T")]

###keep going until you find the issue
GhobrialMoheb commented 1 year ago

I actually did yes a manual run, and the issue for me was at the level of the BAM file, it lacks "elementMetadata$seq"

zorglubz-coder commented 1 year ago

Is it only for a specific bam file, or all of them?

GhobrialMoheb commented 1 year ago

I did multiple trials on different FASTQ files, and the issue is the same throughout.

I am not sure though if there is a FASTQ file that is for sure good to use as a control to make sure the script runs smoothly

zorglubz-coder commented 1 year ago

I am sorry if you did that already, but I need to be sure. When you did a manual run, did you run exactly these commands, on a single bam file out of any foreach loop (i.e. not in the continuity of the script)? The bam file are generated with previous steps of the Viral_Track script.

BAM_file= readGAlignments(paste("path_to_bam"),param = ScanBamParam(what =scanBamWhat()))
Viral_reads = unique(BAM_file@elementMetadata$seq)
Viral_reads_contents = alphabetFrequency(Viral_reads,as.prob =T )
Viral_reads_contents = Viral_reads_contents[,c("A","C","G","T")]

###keep going until you find the issue
GhobrialMoheb commented 1 year ago

yes, I do it for a single BAM file

zorglubz-coder commented 1 year ago

Honestly I have not idea about anything at this point. I don't knwo really how it works. Here is a bam file that is working for me, can you run the previous code on it? Can you try on this file (just unzip it). refseq_NC_031338_10056nt_Moku.zip

mohebg commented 1 year ago

image

mohebg commented 1 year ago

here is your BAM file, I still don't see content of element metadata (as it is the case for my runs)

mohebg commented 1 year ago

Did you get QC reports in the end ?

zorglubz-coder commented 1 year ago

I am puzzled here, could you send a picture of the code you ran?

zorglubz-coder commented 1 year ago

Untitled

zorglubz-coder commented 1 year ago

Did you get QC reports in the end ?

I am still currently running it after solving the issue mentioned earlier https://github.com/PierreBSC/Viral-Track/issues/23#issuecomment-1209609016

mohebg commented 1 year ago

I am puzzled here, could you send a picture of the code you ran?

BAM_file= readGAlignments("refseq_NC_031338_10056nt_Moku.bam")

mohebg commented 1 year ago

Untitled

I get an error when I run: BAM_file= readGAlignments("refseq_NC_031338_10056nt_Moku.bam",param = ScanBamParam(what =scanBamWhat()))

image

mohebg commented 1 year ago

I get an error when I run: BAM_file= readGAlignments("refseq_NC_031338_10056nt_Moku.bam",param = ScanBamParam(what =scanBamWhat()))

mohebg commented 1 year ago

How is it possible, that I get this error while you don't, when it is the same BAM file ?

zorglubz-coder commented 1 year ago

Well... I have not a single idea apart from deleting (all the packages named bellow) and reinstalling the packages suggested in the installation page.

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install(version = "3.10") 
BiocManager::install(c("Biostrings", "ShortRead","doParallel","GenomicAlignments","Gviz","GenomicFeatures","Rsubread"))
zorglubz-coder commented 1 year ago

If you have an error in the package installation, then this is where the issue is.

mohebg commented 1 year ago

All the packages are installed, I will reinstall them and see

zorglubz-coder commented 1 year ago

Did you get QC reports in the end ?

I did it! I hope we can solve your issue too.

GhobrialMoheb commented 1 year ago

great, thanks alot

GhobrialMoheb commented 1 year ago

May I ask you some questions:

mohebg commented 1 year ago

Also, the BAM file you sent me is 424 Kb, is a bit small - just wanted to be sure is the correct one? Also it's name seems shorter than the screenshot you sent - thus I am asking

zorglubz-coder commented 1 year ago

I am sorry but I am not comfortable sharing the FASTQ file. here is the output file populated. All the hidden text is the name of the fastq file

Untitled

GhobrialMoheb commented 1 year ago

"_Aligned.sortedByCoord.out.bam" is the BAM file to load in the function: BAM_file= readGAlignments("XXX.bam")

zorglubz-coder commented 1 year ago

Also, the BAM file you sent me is 424 Kb, is a bit small - just wanted to be sure is the correct one? Also it's name seems shorter than the screenshot you sent - thus I am asking

Yes that is the correct size for this bam file, and I just provided the full path to the file so it is longer. The issue is with the error you have when you add the option ScanBamParam(what =scanBamWhat()). This option does not gives me an error.

GhobrialMoheb commented 1 year ago

I see, I don't know why it give me this error

zorglubz-coder commented 1 year ago

"_Aligned.sortedByCoord.out.bam" is the BAM file to load in the function: BAM_file= readGAlignments("XXX.bam")

No it is the bam files present in Viral_BAM_files see the line bellow form the Viral_track_file

BAM_file= readGAlignments(paste(k,"Viral_BAM_files/",i,".bam",sep = ""),param = ScanBamParam(what =scanBamWhat()))

mohebg commented 1 year ago

Hi, I do have the exact same issue as @Shiywa . After some understanding of the code, it seems that the if statment line 343 of Viral_Track_scanning.R is at fault.

if (class(Viral_reads_contents)=="numeric") { Viral_reads_contents = matrix(Viral_reads_contents_mean,ncol = 4) }

The class of Viral_reads_contents is "matrix" "array" which leads to the issue observed since if only wants one class. This code seems useless since the variable "Viral_reads_contents_mean" is not defined at this step.

I would suggest that you just delete this part, and try to run again. (I am currently running it, I hope this solved the issue)

From my understanding of the code, you can already see if you have virus mapped at this step if you look at the file output_folder/fasta_file_name_extracted/Count_chromosomes.txt

Please let me know if this helps.

Did this removal of the line help?

mohebg commented 1 year ago

@zorglubz-coder , now I could successfully load the BAM file and get the viral reads:

image

However I get this error:

Export of the viral SAM file done for SRR11616442 Error in colnames<-(*tmp*, value = c("N_reads", "N_unique_reads", : attempt to set 'colnames' on an object with less than two dimensions Calls: colnames<- -> colnames<- Execution halted

zorglubz-coder commented 1 year ago

Hi, I do have the exact same issue as @Shiywa . After some understanding of the code, it seems that the if statment line 343 of Viral_Track_scanning.R is at fault. if (class(Viral_reads_contents)=="numeric") { Viral_reads_contents = matrix(Viral_reads_contents_mean,ncol = 4) } The class of Viral_reads_contents is "matrix" "array" which leads to the issue observed since if only wants one class. This code seems useless since the variable "Viral_reads_contents_mean" is not defined at this step. I would suggest that you just delete this part, and try to run again. (I am currently running it, I hope this solved the issue) From my understanding of the code, you can already see if you have virus mapped at this step if you look at the file output_folder/fasta_file_name_extracted/Count_chromosomes.txt Please let me know if this helps.

Did this removal of the line help?

Yes, I have no more issue without the mentioned lines

zorglubz-coder commented 1 year ago

@zorglubz-coder , now I could successfully load the BAM file and get the viral reads:

image

However I get this error:

Export of the viral SAM file done for SRR11616442 Error in colnames<-(*tmp*, value = c("N_reads", "N_unique_reads", : attempt to set 'colnames' on an object with less than two dimensions Calls: colnames<- -> colnames<- Execution halted

Are you still trying to run it manually? I had this issue also when I ran it manually, but running the Viral_Track_scanning.R entirely (with the if line 343 removed) worked just fine.

mohebg commented 1 year ago

@zorglubz-coder , now I could successfully load the BAM file and get the viral reads: image However I get this error: Export of the viral SAM file done for SRR11616442 Error in colnames<-(*tmp*, value = c("N_reads", "N_unique_reads", : attempt to set 'colnames' on an object with less than two dimensions Calls: colnames<- -> colnames<- Execution halted

Are you still trying to run it manually? I had this issue also when I ran it manually, but running the Viral_Track_scanning.R entirely (with the if line 343 removed) worked just fine.

yes, I have this issue with the Viral_Track_scanning.R run

mohebg commented 1 year ago

image did you experience this before ?

zorglubz-coder commented 1 year ago

@zorglubz-coder , now I could successfully load the BAM file and get the viral reads: image However I get this error: Export of the viral SAM file done for SRR11616442 Error in colnames<-(*tmp*, value = c("N_reads", "N_unique_reads", : attempt to set 'colnames' on an object with less than two dimensions Calls: colnames<- -> colnames<- Execution halted

Are you still trying to run it manually? I had this issue also when I ran it manually, but running the Viral_Track_scanning.R entirely (with the if line 343 removed) worked just fine.

yes, I have this issue with the Viral_Track_scanning.R run

I am sorry but I cannot help you anymore on this issue

zorglubz-coder commented 1 year ago

image did you experience this before ?

I never ran these lines in manual so I don't think I had this error.

One last recommendation, if the package re installation changed some stuff, maybe start everything from scratch: Creation of the Index and of the annotation file Pre-processing of the single data Detection of viruses in scRNA-seq data

I will not be able to help you anymore, I hope you can figure it out.

mohebg commented 1 year ago

Thanks alot for your help

mohebg commented 1 year ago

QC_report.pdf

mohebg commented 1 year ago

image it seems that I advanced a bit, I got the QC report: image

zorglubz-coder commented 1 year ago

nice, good luck for the rest of you analysis

Shiywa commented 1 year ago

Hi, I do have the exact same issue as @Shiywa . After some understanding of the code, it seems that the if statment line 343 of Viral_Track_scanning.R is at fault.

if (class(Viral_reads_contents)=="numeric") { Viral_reads_contents = matrix(Viral_reads_contents_mean,ncol = 4) }

The class of Viral_reads_contents is "matrix" "array" which leads to the issue observed since if only wants one class. This code seems useless since the variable "Viral_reads_contents_mean" is not defined at this step.

I would suggest that you just delete this part, and try to run again. (I am currently running it, I hope this solved the issue)

From my understanding of the code, you can already see if you have virus mapped at this step if you look at the file output_folder/fasta_file_name_extracted/Count_chromosomes.txt

Please let me know if this helps.

Thanks for your suggestion. I have solved the problem by setting class(Viral_reads_contents) as class(Viral_reads_contents)[1]. that seems due to the version class.