ChristofferFlensburg / superFreq

Analysis pipeline for cancer sequencing data
MIT License
109 stars 33 forks source link

SuperFreq cannot find varscan command #120

Open avivneuronvision opened 6 months ago

avivneuronvision commented 6 months ago

When running SuperFreq without vcf files, I get: Calls: superFreq -> superFreq -> analyse In addition: Warning messages: 1: In system("varscan") : error in running command 2: In performPreliminaryVariantCallingOnMissingVCFs(sampleMetaData, : Cant find varscan command, so cant perform preliminary variant calling of missing VCF. Execution halted

I tried: 1) jar_file="/home/apps/varscan/VarScan.v2.4.2.jar" and export CLASSPATH="$CLASSPATH:$jar_file" 2) jar_file="/home/apps/varscan/VarScan.v2.4.2.jar" and export R_LIBS_USER="$R_LIBS_USER:$jar_file" 3) export PATH=$PATH:/home/apps/varscan/ 4) alias varscan="java -jar /home/apps/varscan/VarScan.v2.4.2.jar"

But all failed. Please help.

ChristofferFlensburg commented 6 months ago

Hi!

The general concept is that if you can run the varscan command outside of R, then the preliminary variant calling should work in superFreq. So first thing would be to confirm that you can run just varscan on command line before you enter R.

Otherwise, adding in the preliminary variant calling to superFreq was a quite late addition, and if the in-R version isn't running, then might be easier to just run it outside of R before you run superfreq. The code that runs the preliminary variant calling (and you see the check that threw the warning) is this:

performPreliminaryVariantCallingOnMissingVCFs = function(metaData, reference, cpus=1) {
  #check which VCF files are missing, place them somewhere smart, in plots or R?
  missing = !file.exists(metaData$VCF)
  if ( !any(missing) ) return()

  #check if varscan exists
  catLog('Check if the varscan command exists.\n')
  a = system('varscan')
  if ( a != 0 ) {
    catLog('Cant find varscan command, so cant perform preliminary variant calling of missing VCF.\n')
    warning('Cant find varscan command, so cant perform preliminary variant calling of missing VCF.')
    return()
  }
  catLog('Seems like varscan command exists.\n')
  catLog('Samtools shouldve been check earlier, so assuming that exists.\n')

  #use the one-liner from the superFreq README
  systemCalls = paste0('samtools mpileup -d 1000 -q 15 -Q 15 -A -f ', reference, ' ',  metaData$BAM[missing], ' | varscan mpileup2cns - --variants --strand-filter 0 --p-value 0.5 --min-var-freq 0.02 --output-vcf 1 > ', metaData$VCF[missing])
  ret = mclapply(systemCalls, function(call) {
    catLog(call, '\n\n')
    ret = system(call)
    return(ret)
  }, mc.cores=cpus)
  if ( all(ret == 0) ) catLog('Runs returned with exit status 0, seems good.\n')
  else {
    catLog('Not all runs returned with exit status 0. Might be a problem, but continuing with a warning.\n')
    warning('Not all varscan runs exited with status 0. Preliminary variant calling might not have been done properly. Exit statuses: ', ret)
  }

  return()
}

so key part here is the command

  systemCalls = paste0('samtools mpileup -d 1000 -q 15 -Q 15 -A -f ', reference, ' ',  metaData$BAM[missing], ' | varscan mpileup2cns - --variants --strand-filter 0 --p-value 0.5 --min-var-freq 0.02 --output-vcf 1 > ', metaData$VCF[missing])

you can copy the settings from there if you want. Then just make sure the VCF column in the superfreq meta data file points to the .vcf files.

Let me know how it goes!