PierreBSC / Viral-Track

MIT License
54 stars 27 forks source link

Unable to get past "Export of the viral SAM file done for XXX" #17

Closed 14zac2 closed 3 years ago

14zac2 commented 3 years ago

Hello!

I'm trying to use Viral-Track to analyze single-cell woodchuck liver samples infected with the woodchuck hepatitis virus. When running your program, I experience the same error as mentioned #4 . Although R seemed to recognize the %dopar% command when running through the code line-by-line, that step was consistently throwing giving the error Error in unserialize(socklist[[n]]) : error reading from connection. As you recommended in the thread, I tried changing both parallel loops involving %dopar% to regular for loops, but now I am experiencing a new error:

Loading of the libraries.... ... done ! Warning message:
Output directory does not exist ! Creating it !
1 Fastq files are going to be processed !
Mapping L207_TLH_S7_R2_extracted.fastq file
Apr 04 04:43:31 ..... started STAR run
Apr 04 04:43:32 ..... loading genome
Apr 04 05:10:34 ..... started 1st pass mapping
Apr 04 05:25:23 ..... finished 1st pass mapping
Apr 04 05:25:24 ..... inserting junctions into the genome indices
Apr 04 05:27:40 ..... started mapping
Apr 04 06:29:33 ..... finished mapping
Apr 04 06:29:34 ..... started sorting BAM
Apr 04 06:36:05 ..... finished successfully
Mapping ofL207_TLH_S7_R2_extracted.fastq done !
All fastq files have been mapped successfully
Starting the BAM file analysis
Indexing of the bam file for L207_TLH_S7_R2_extracted is done
Computing stat file for the bam file for L207_TLH_S7_R2_extracted is done
Checking the mapping quality of each virus...
Export of the viral SAM file done for L207_TLH_S7_R2_extracted
Error in h(simpleError(msg, call)) :
  error in evaluating the argument 'flesh' in selecting a method for function 'relist': XStringSet object is too big to be unlisted (would result in an XString
  object of length 2^31 or more)
Calls: as ... unlist -> unlist -> .Call2 -> .handleSimpleError -> h
In addition: There were 50 or more warnings (use warnings() to see the first 50)
Execution halted

Do you have any idea as to why this might be occurring? I am worried that it might be due to my host organism which consists of 3123 contigs. I did all the UMITools preprocessing requested and my package versions are r/4.0.2 samtools/1.10 star/2.7.8a stringtie/2.1.3 plus R packages as follows:

> packageVersion("Biostrings")
[1] ‘2.56.0’
> packageVersion("ShortRead")
[1] ‘1.46.0’
> packageVersion("doParallel")
[1] ‘1.0.16’
> packageVersion("GenomicAlignments")
[1] ‘1.24.0’
> packageVersion("Gviz")
[1] ‘1.32.0’
> packageVersion("GenomicFeatures")
[1] ‘1.40.1’
> packageVersion("Rsubread")
[1] ‘2.2.6’

Please let me know if I can provide any other information, and I look forward to hearing from you!

PierreBSC commented 3 years ago

Hi Zoé,

I might have an idea of where the problem is coming from : in Viral-Track, chromosomes from the host are removed (line 307). The problem is that the name of those chromosomes can change based on the reference genome/species. I would recommend to update the object "Chromosome_to_remove" in line 283 accordingly with the correct chromosome names !

Hope this will work !

Best

Pierre

14zac2 commented 3 years ago

Hi Pierre,

Thanks so much for the tip, it worked! It might be worth it to update the code to be able to accommodate multiple hosts. For anyone else referencing this issue, I fixed the script (for my situation) with a few lines of bash code and my host genome fasta file:

# Create list of chromosomes from my fasta file (host.fa)
grep ">" host.fa > chroms.txt
sed -i 's/>//g' chroms.txt
awk '{ print "\""$0"\""}' chroms.txt > chromQuote.txt
paste -s -d, chromQuote.txt > chromQuoteComma.txt
awk '{ print "Chromosome_to_remove = c("$0")"}' chromQuoteComma.txt > formattedChromList.txt
# The following assumes the Viral-Track code is located in ~
# Inserts the file I just created into the appropriate section of the code
sed -e '/KI270394.1/r formattedChromList.txt' ~/Viral-Track/Viral_Track_scanning.R > ~/Viral-Track/Viral_Track_scanning_2.R
# Now run as normal but call on Viral_Track_scanning_2.R

Many thanks again! Zoe

GhobrialMoheb commented 1 year ago

@PierreBSC , @14zac2 Hi Zoe, thanks alot for the update, I have tried what you mentioned and I am getting another error:

Aug 04 12:43:51 ..... started STAR run Aug 04 12:43:51 ..... loading genome Aug 04 12:45:00 ..... started 1st pass mapping Aug 04 12:45:24 ..... finished 1st pass mapping Aug 04 12:45:25 ..... inserting junctions into the genome indices Aug 04 12:47:00 ..... started mapping Aug 04 12:47:26 ..... finished mapping Aug 04 12:47:31 ..... started sorting BAM Aug 04 12:47:39 ..... finished successfully Mapping ofhgmm_100_R2_extracted.fastq done ! All fastq files have been mapped successfully Starting the BAM file analysis Indexing of the bam file for hgmm_100_R2_extracted is done Computing stat file for the bam file for hgmm_100_R2_extracted is done Checking the mapping quality of each virus... Export of the viral SAM file done for hgmm_100_R2_extracted Error in { : task 1 failed - "different row counts implied by arguments" Calls: %dopar% -> Execution halted

May you please help me with the issue?