ricardoharripaul commented 6 years ago

Hi,

I received an error from HMZDelFinder while filtering for AOH.I called variants using GATK Best Practices and zipped the files using bzip2 filename or bzip2 -z filename. All VCF files are in v4.2. Files extensions are .bz2. I receive the error listed below but interestingly enough I can bzcat the file paths from the command line with no problem.Can you please help me solve this issue?

[1] "[step 1 out of 7] ** AOH data **"[1] "Reading VCFs and generating AOH data using CBS ..." | | 0%bzcat: I/O or other error, bailing out. Possible reason follows.bzcat: Broken pipe Input file = /home/ricardo/Downloads/DMA/ATLAS/DMA_1.10019.vcf.bz2, output file = (stdout)bzcat: I/O or other error, bailing out. Possible reason follows.bzcat: Broken pipe Input file = /home/ricardo/Downloads/DMA/ATLAS/DMA-1_S1.vcf.bz2, output file = (stdout)bzcat: I/O or other error, bailing out. Possible reason follows.bzcat: Broken pipe Input file = /home/ricardo/Downloads/DMA/ATLAS/DMA-2_S1.vcf.bz2, output file = (stdout)bzcat: I/O or other error, bailing out. Possible reason follows.bzcat: Broken pipe Input file = /home/ricardo/Downloads/DMA/ATLAS/DMA_2.10018.vcf.bz2, output file = (stdout)bzcat: I/O or other error, bailing out. Possible reason follows.bzcat: Broken pipe Input file = /home/ricardo/Downloads/DMA/ATLAS/DMA_3.10020.vcf.bz2, output file = (stdout)bzcat: I/O or other error, bailing out. Possible reason follows.bzcat: Broken pipe Input file = /home/ricardo/Downloads/DMA/ATLAS/DMA-3_S1.vcf.bz2, output file = (stdout)bzcat: I/O or other error, bailing out. Possible reason follows.bzcat: Broken pipe Input file = /home/ricardo/Downloads/DMA/ATLAS/DMA-5_S1.vcf.bz2, output file = (stdout)bzcat: I/O or other error, bailing out. Possible reason follows.bzcat: Broken pipe Input file = /home/ricardo/Downloads/DMA/ATLAS/DMA-4_S1.vcf.bz2, output file = (stdout)bzcat: I/O or other error, bailing out. Possible reason follows.bzcat: Broken pipe Input file = /home/ricardo/Downloads/DMA/ATLAS/DMA-6_S1.vcf.bz2, output file = (stdout)bzcat: I/O or other error, bailing out. Possible reason follows.bzcat: Broken pipe Input file = /home/ricardo/Downloads/DMA/ATLAS/DMA-7_S1.vcf.bz2, output file = (stdout)

tgambin commented 6 years ago

Did you try to manually execute bzcat for singe file: eg.: bzcat /home/ricardo/Downloads/DMA/ATLAS/DMA_1.10019.vcf.bz2

ricardoharripaul commented 6 years ago

Hi,

Yes. I did try that and it did work and produced the vcf file.

Ricardo

On Thu, Feb 22, 2018 at 11:55 AM, tgambin notifications@github.com wrote:

Did you try to manually execute bzcat for singe file: eg.: bzcat /home/ricardo/Downloads/DMA/ATLAS/DMA_1.10019.vcf.bz2

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BCM-Lupskilab/HMZDelFinder/issues/7#issuecomment-367746221, or mute the thread https://github.com/notifications/unsubscribe-auth/AHL3I8dqbZdeHD2e8MfTgJaqD9MnaaNnks5tXZvbgaJpZM4SOhe9 .

tgambin commented 6 years ago

Ok. Could you please, try to run in R "read.vcf.header" function for the single vcf file ?

ricardoharripaul commented 6 years ago

Hi,

So I tried to run the command and it gives me the same error, however, it does print the vcf header. Does the mean the software is running despite saying it cannot find the file? Should I just let it run to completion?

read.vcf.header("/home/ricardo/Downloads/DMA/ATLAS/DMA-7_S1.vcf.bz2")bzcat: I/O or other error, bailing out. Possible reason follows. bzcat: Broken pipe Input file = /home/ricardo/Downloads/DMA/ATLAS/DMA-7_S1.vcf.bz2, output file = (stdout)$header [1] "CHROM" "POS" "ID" "REF" "ALT" "QUAL" "FILTER" "INFO" "FORMAT" "DMA_7"

$INFO $INFO$AA key value 1 ID AA 2 Number 1 3 Type String 4 Description The inferred allele ancestral to the chimpanzee/human lineage

$INFO$AC key value 1 ID AC 2 Number A 3 Type Integer 4 Description Allele count in genotypes 5 for each ALT allele NA= for each ALT allele 6 in the same order as listed NA= in the same order as listed

$INFO$AF key value 1 ID AF 2 Number A 3 Type Float 4 Description Allele Frequency 5 for each ALT allele NA= for each ALT allele 6 in the same order as listed NA= in the same order as listed

$INFO$AF1000G key value 1 ID AF1000G 2 Number 1 3 Type String 4 Description The allele frequency from all populations of 1000 genomes data

$INFO$AN key value 1 ID AN 2 Number 1 3 Type Integer 4 Description Total number of alleles in called genotypes

$INFO$BaseQRankSum key value 1 ID BaseQRankSum 2 Number 1 3 Type Float 4 Description Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities

$INFO$BLOCKAVG_min30p3a

key

1 ID 2 Number 3 Type 4 Description 5 have the same filter value 6 and have all sample values in range [x 7 y] 8 y < 9 (x*(1+0.3))). All printed site block sample values are the minimum observed in the region spanned by the block

                                   value

1 BLOCKAVG_min30p3a 2 0 3 Flag 4 Non-variant site block. All sites in a block are constrained to be non-variant 5 NA= have the same filter value 6 NA= and have all sample values in range [x 7 NA=y] 8 max(x+3 9 NA=(x*(1+0.3))). All printed site block sample values are the minimum observed in the region spanned by the block

$INFO$clinvar key value 1 ID clinvar 2 Number . 3 Type String 4 Description Clinical significance

$INFO$cosmic key 1 ID 2 Number 3 Type 4 Description

                          value

1 cosmic 2 . 3 String 4 The numeric identifier for the variant in the Catalogue of Somatic Mutations in Cancer (COSMIC) database

$INFO$CSQR key 1 ID 2 Number 3 Type 4 Description

                                        value

1 CSQR 2 . 3 String 4 Regulatory consequence type as predicted by VEP version 72. Feature source: Ensembl. Format: RegulatoryID|Consequence

$INFO$CSQT key 1 ID 2 Number 3 Type 4 Description

                                          value

1 CSQT 2 . 3 String 4 Transcript consequence as predicted by VEP version 72. Transcript source: refseq. Format: HGNC|TranscriptID|Consequence

$INFO$Dels key value 1 ID Dels 2 Number 1 3 Type Float 4 Description Fraction of Reads Containing Spanning Deletions

$INFO$DP key value 1 ID DP 2 Number 1 3 Type Integer 4 Description Approximate read depth; some reads may have been filtered

$INFO$DS key value 1 ID DS 2 Number 0 3 Type Flag 4 Description Were any of the samples downsampled?

$INFO$END key value 1 ID END 2 Number 1 3 Type Integer 4 Description End position of the region described in this record

$INFO$EVS

key

1 ID 2 Number 3 Type 4 Description 5 sample count and coverage taken from the Exome Variant Server (EVS). Format: AlleleFreqEVS|EVSCoverage|EVSSamples

                                        value

1 EVS 2 . 3 String 4 Allele frequency 5 NA= sample count and coverage taken from the Exome Variant Server (EVS). Format: AlleleFreqEVS|EVSCoverage|EVSSamples

$INFO$FS key value 1 ID FS 2 Number 1 3 Type Float 4 Description Phred-scaled p-value using Fisher's exact test to detect strand bias

$INFO$GMAF

key

1 ID 2 Number 3 Type 4 Description 5 the frequency of the second most frequent allele. Format: GlobalMinorAllele|AlleleFreqGlobalMinor

                        value

1 GMAF 2 . 3 String 4 Global minor allele frequency (GMAF); technically 5 NA= the frequency of the second most frequent allele. Format: GlobalMinorAllele|AlleleFreqGlobalMinor

$INFO$HaplotypeScore key value 1 ID HaplotypeScore 2 Number 1 3 Type Float 4 Description Consistency of the site with at most two segregating haplotypes

$INFO$HRun key value 1 ID HRun 2 Number 1 3 Type Integer 4 Description Largest Contiguous Homopolymer Run of Variant Allele In Either Direction

$INFO$InbreedingCoeff key 1 ID 2 Number 3 Type 4 Description

                                                   value

1 InbreedingCoeff 2 1 3 Float 4 Inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation

$INFO$MQ key value 1 ID MQ 2 Number 1 3 Type Float 4 Description RMS Mapping Quality

$INFO$MQ0 key value 1 ID MQ0 2 Number 1 3 Type Integer 4 Description Total Mapping Quality Zero Reads

$INFO$MQRankSum key value 1 ID MQRankSum 2 Number 1 3 Type Float 4 Description Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities

$INFO$phastCons key 1 ID 2 Number 3 Type 4 Description

                                                            value

1 phastCons 2 0 3 Flag 4 Denotes if the variant is an identical or similar sequence that occurs between species and maintained between species throughout evolution

$INFO$QD key value 1 ID QD 2 Number 1 3 Type Float 4 Description Variant Confidence/Quality by Depth

$INFO$ReadPosRankSum key value 1 ID ReadPosRankSum 2 Number 1 3 Type Float 4 Description Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias

$INFO$SB key value 1 ID SB 2 Number 1 3 Type Float 4 Description Strand Bias

$FORMAT $FORMAT$AD key value 1 ID AD 2 Number . 3 Type Integer 4 Description Allelic depths for the ref and alt alleles in the order listed

$FORMAT$DP key value 1 ID DP 2 Number 1 3 Type Integer 4 Description Approximate read depth (reads with MQ=255 or with bad mates are filtered)

$FORMAT$GQ key value 1 ID GQ 2 Number 1 3 Type Float 4 Description Genotype Quality

$FORMAT$GQX key 1 ID 2 Number 3 Type 4 Description 5 Genotype quality assuming non-variant position} value 1 GQX 2 1 3 Integer 4 Minimum of {Genotype quality assuming variant position 5 NA=Genotype quality assuming non-variant position}

$FORMAT$GT key value 1 ID GT 2 Number 1 3 Type String 4 Description Genotype

$FORMAT$MQ key value 1 ID MQ 2 Number 1 3 Type Integer 4 Description RMS Mapping Quality

$FORMAT$PL key 1 ID 2 Number 3 Type 4 Description 5 Phred-scaled likelihoods for genotypes as defined in the VCF specification value 1 PL 2 G 3 Integer 4 Normalized 5 NA= Phred-scaled likelihoods for genotypes as defined in the VCF specification

$FORMAT$VF key 1 ID 2 Number 3 Type 4 Description 5 the ratio of the sum of the called variant depth to the total depth value 1 VF 2 1 3 Float 4 Variant Frequency 5 NA= the ratio of the sum of the called variant depth to the total depth

$nlines [1] 82 Warning message:closing unused connection 3 (bzcat /home/ricardo/Downloads/DMA/ATLAS/DMA-7_S1.vcf.bz2)

On Thu, Feb 22, 2018 at 3:11 PM, tgambin notifications@github.com wrote:

Ok. Could you please, try to run in R "read.vcf.header" function for the single vcf file ?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BCM-Lupskilab/HMZDelFinder/issues/7#issuecomment-367806891, or mute the thread https://github.com/notifications/unsubscribe-auth/AHL3I5TtX-vU--xSKrKNzlBgvcdy9Vkbks5tXcn1gaJpZM4SOhe9 .

ricardoharripaul commented 6 years ago

Hi,

How long is the AOH filtering suppose to run for? Does it take a long time to complete? Does maxing out the cores and running two instances on the same files significantly slow it down?

Thanks,

Ricardo

On Fri, Feb 23, 2018 at 4:07 PM, Ricardo Harripaul < ricardo.harripaul@gmail.com> wrote:

Hi,

So I tried to run the command and it gives me the same error, however, it does print the vcf header. Does the mean the software is running despite saying it cannot find the file? Should I just let it run to completion?

read.vcf.header("/home/ricardo/Downloads/DMA/ATLAS/DMA-7_S1.vcf.bz2")bzcat: I/O or other error, bailing out. Possible reason follows. bzcat: Broken pipe Input file = /home/ricardo/Downloads/DMA/ATLAS/DMA-7_S1.vcf.bz2, output file = (stdout)$header [1] "CHROM" "POS" "ID" "REF" "ALT" "QUAL" "FILTER" "INFO" "FORMAT" "DMA_7"

$INFO $INFO$AA key value 1 ID AA 2 Number 1 3 Type String 4 Description The inferred allele ancestral to the chimpanzee/human lineage

$INFO$AC key value 1 ID AC 2 Number A 3 Type Integer 4 Description Allele count in genotypes 5 for each ALT allele NA= for each ALT allele 6 in the same order as listed NA= in the same order as listed

$INFO$AF key value 1 ID AF 2 Number A 3 Type Float 4 Description Allele Frequency 5 for each ALT allele NA= for each ALT allele 6 in the same order as listed NA= in the same order as listed

$INFO$AF1000G key value 1 ID AF1000G 2 Number 1 3 Type String 4 Description The allele frequency from all populations of 1000 genomes data

$INFO$AN key value 1 ID AN 2 Number 1 3 Type Integer 4 Description Total number of alleles in called genotypes

$INFO$BaseQRankSum key value 1 ID BaseQRankSum 2 Number 1 3 Type Float 4 Description Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities

$INFO$BLOCKAVG_min30p3a key 1 ID 2 Number 3 Type 4 Description 5 have the same filter value 6 and have all sample values in range [x 7 y] 8 y < 9 (x(1+0.3))). All printed site block sample values are the minimum observed in the region spanned by the block value 1 BLOCKAVG_min30p3a 2 0 3 Flag 4 Non-variant site block. All sites in a block are constrained to be non-variant 5 NA= have the same filter value 6 NA= and have all sample values in range [x 7 NA=y] 8 max(x+3 9 NA=(x(1+0.3))). All printed site block sample values are the minimum observed in the region spanned by the block

$INFO$clinvar key value 1 ID clinvar 2 Number . 3 Type String 4 Description Clinical significance

$INFO$cosmic key 1 ID 2 Number 3 Type 4 Description value 1 cosmic 2 . 3 String 4 The numeric identifier for the variant in the Catalogue of Somatic Mutations in Cancer (COSMIC) database

$INFO$CSQR key 1 ID 2 Number 3 Type 4 Description value 1 CSQR 2 . 3 String 4 Regulatory consequence type as predicted by VEP version 72. Feature source: Ensembl. Format: RegulatoryID|Consequence

$INFO$CSQT key 1 ID 2 Number 3 Type 4 Description value 1 CSQT 2 . 3 String 4 Transcript consequence as predicted by VEP version 72. Transcript source: refseq. Format: HGNC|TranscriptID|Consequence

$INFO$Dels key value 1 ID Dels 2 Number 1 3 Type Float 4 Description Fraction of Reads Containing Spanning Deletions

$INFO$DP key value 1 ID DP 2 Number 1 3 Type Integer 4 Description Approximate read depth; some reads may have been filtered

$INFO$DS key value 1 ID DS 2 Number 0 3 Type Flag 4 Description Were any of the samples downsampled?

$INFO$END key value 1 ID END 2 Number 1 3 Type Integer 4 Description End position of the region described in this record

$INFO$EVS key 1 ID 2 Number 3 Type 4 Description 5 sample count and coverage taken from the Exome Variant Server (EVS). Format: AlleleFreqEVS|EVSCoverage|EVSSamples value 1 EVS 2 . 3 String 4 Allele frequency 5 NA= sample count and coverage taken from the Exome Variant Server (EVS). Format: AlleleFreqEVS|EVSCoverage|EVSSamples

$INFO$FS key value 1 ID FS 2 Number 1 3 Type Float 4 Description Phred-scaled p-value using Fisher's exact test to detect strand bias

$INFO$GMAF key 1 ID 2 Number 3 Type 4 Description 5 the frequency of the second most frequent allele. Format: GlobalMinorAllele|AlleleFreqGlobalMinor value 1 GMAF 2 . 3 String 4 Global minor allele frequency (GMAF); technically 5 NA= the frequency of the second most frequent allele. Format: GlobalMinorAllele|AlleleFreqGlobalMinor

$INFO$HaplotypeScore key value 1 ID HaplotypeScore 2 Number 1 3 Type Float 4 Description Consistency of the site with at most two segregating haplotypes

$INFO$HRun key value 1 ID HRun 2 Number 1 3 Type Integer 4 Description Largest Contiguous Homopolymer Run of Variant Allele In Either Direction

$INFO$InbreedingCoeff key 1 ID 2 Number 3 Type 4 Description value 1 InbreedingCoeff 2 1 3 Float 4 Inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation

$INFO$MQ key value 1 ID MQ 2 Number 1 3 Type Float 4 Description RMS Mapping Quality

$INFO$MQ0 key value 1 ID MQ0 2 Number 1 3 Type Integer 4 Description Total Mapping Quality Zero Reads

$INFO$MQRankSum key value 1 ID MQRankSum 2 Number 1 3 Type Float 4 Description Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities

$INFO$phastCons key 1 ID 2 Number 3 Type 4 Description value 1 phastCons 2 0 3 Flag 4 Denotes if the variant is an identical or similar sequence that occurs between species and maintained between species throughout evolution

$INFO$QD key value 1 ID QD 2 Number 1 3 Type Float 4 Description Variant Confidence/Quality by Depth

$INFO$ReadPosRankSum key value 1 ID ReadPosRankSum 2 Number 1 3 Type Float 4 Description Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias

$INFO$SB key value 1 ID SB 2 Number 1 3 Type Float 4 Description Strand Bias

$FORMAT $FORMAT$AD key value 1 ID AD 2 Number . 3 Type Integer 4 Description Allelic depths for the ref and alt alleles in the order listed

$FORMAT$DP key value 1 ID DP 2 Number 1 3 Type Integer 4 Description Approximate read depth (reads with MQ=255 or with bad mates are filtered)

$FORMAT$GQ key value 1 ID GQ 2 Number 1 3 Type Float 4 Description Genotype Quality

$FORMAT$GQX key 1 ID 2 Number 3 Type 4 Description 5 Genotype quality assuming non-variant position} value 1 GQX 2 1 3 Integer 4 Minimum of {Genotype quality assuming variant position 5 NA=Genotype quality assuming non-variant position}

$FORMAT$GT key value 1 ID GT 2 Number 1 3 Type String 4 Description Genotype

$FORMAT$MQ key value 1 ID MQ 2 Number 1 3 Type Integer 4 Description RMS Mapping Quality

$FORMAT$PL key 1 ID 2 Number 3 Type 4 Description 5 Phred-scaled likelihoods for genotypes as defined in the VCF specification value 1 PL 2 G 3 Integer 4 Normalized 5 NA= Phred-scaled likelihoods for genotypes as defined in the VCF specification

$FORMAT$VF key 1 ID 2 Number 3 Type 4 Description 5 the ratio of the sum of the called variant depth to the total depth value 1 VF 2 1 3 Float 4 Variant Frequency 5 NA= the ratio of the sum of the called variant depth to the total depth

$nlines [1] 82 Warning message:closing unused connection 3 (bzcat /home/ricardo/Downloads/DMA/ATLAS/DMA-7_S1.vcf.bz2)

On Thu, Feb 22, 2018 at 3:11 PM, tgambin notifications@github.com wrote:

Ok. Could you please, try to run in R "read.vcf.header" function for the single vcf file ?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BCM-Lupskilab/HMZDelFinder/issues/7#issuecomment-367806891, or mute the thread https://github.com/notifications/unsubscribe-auth/AHL3I5TtX-vU--xSKrKNzlBgvcdy9Vkbks5tXcn1gaJpZM4SOhe9 .

tgambin commented 6 years ago

It should not take so long. How many files you are processing? Could you send me a sample VCF that you are trying to process?

Thanks, Tomek

ricardoharripaul commented 6 years ago

Hi Tomek,

I am running 10 files and it not processing. I tried using the same commands in the terminal and it did not give the error but the whole AOH calling hangs. Here is the set of VCF files I am using. Just a note, I did run all the BAM files with no problems.

https://drive.google.com/open?id=1wCgmq5mUhsOYCYzro0n0W7lXMSHTWgV6

Thanks,

Ricardo

On Sun, Feb 25, 2018 at 11:02 AM, tgambin notifications@github.com wrote:

It should not take so long. How many files you are processing? Could you send me a sample VCF that you are trying to process?

Thanks, Tomek

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BCM-Lupskilab/HMZDelFinder/issues/7#issuecomment-368320919, or mute the thread https://github.com/notifications/unsubscribe-auth/AHL3IzoaSeqopd2qMlKtY1zhpIeRSaXxks5tYYQlgaJpZM4SOhe9 .

tgambin commented 6 years ago

Thanks! Let me test it and I get back to you soon

tgambin commented 6 years ago

I found the problem and it supposed to be fixed right now (it was related to the way I parsed allelic depths). Please, try again and remember to set: vR_id<-"AD" tR_id<-"DP" filter <- "."

ricardoharripaul commented 6 years ago

Thanks a lot Tomek,

I made the changes you suggested but I am still getting some errors. I pasted the code I used below:

| | | | \/ | / \ | | () | | _ | || | |\/| | / /| | | |/ \ | | | | ' \ / ` |/ \ '| | | | | |/ /| |_| | / | | | | | | | (| | / | || ||| |/|/ _||| ||| ||_,|\|_|

[1] "[step 1 out of 7] AOH data " [1] "Reading VCFs and generating AOH data using CBS ..."	======================================================================	100% [1] "Extending AOHs by length of uncertain regions ..."	======================================================================	100% [1] "[step 2 out of 7] Preparing RPKM data " [1] "Reading RPKM files ..."

0%Error in if (file.info(file)$size == 0) { : missing value where TRUE/FALSE needed

setwd("/home/ricardo/Downloads/HMZDelFinder-master/") workDir <- getwd() mainDir <- paste0(workDir ,"/HMZDelFinder/"); if (!file.exists(mainDir)){dir.create(mainDir)} dataDir <- paste0(mainDir, "data/" , sep=""); if (!file.exists(dataDir)){dir.create(dataDir)} # data directory

library(RCurl) library(data.table) library(gdata) library(parallel) library(Hmisc) library(matrixStats) library(DNAcopy) library(GenomicRanges) library(Rsubread)

eval( expr = parse( text = getURL(" https://raw.githubusercontent.com/BCM-Lupskilab/HMZDelFinder/master/src/HMZDelFinder.R") ))

bedFile <- paste0(dataDir, "tgp_hg19.bed") # set path to BED file outputDir <- paste0(mainDir, "out/" , sep=""); if (!file.exists(outputDir)){dir.create(outputDir)} # create output directory plotsDir <- paste0(mainDir, "plots/" , sep=""); if (!file.exists(plotsDir)){dir.create(plotsDir)} # create output plots directory rpkmFiles <- dir(paste(dataDir, "TGP/",sep=""), "rpkm2.txt$") # list of RPKM file names rpkmFids <- gsub(".rpkm2.txt", "", rpkmFiles) # list of sample identifiers rpkmPaths <- paste0(paste0(dataDir, "TGP/"), rpkmFiles) # list of paths to RPKM files aohDir <- paste0(mainDir, "AOH/" , sep=""); if (!file.exists(aohDir)){dir.create(aohDir)} aohRDataOut <- paste(mainDir, "AOH/extAOH_small.RData", sep="") # temprary file to store AOH data

rpkmFiles <- rpkmFiles[1:10] rpkmFids<- rpkmFids[1:10] rpkmPaths <- rpkmPaths[1:10] pathToBams <- "/home/ricardo/Downloads/DMA/" rpkmDir <- "/home/ricardo/Downloads/HMZDelFinder-master/HMZDelFinder/data/TGP" sampleNames <- sapply(strsplit(dir(pathToBams, "bam$"), "[/\.]"), function(x){x[length(x)-1]}) # sample identifiers sampleNames <- sampleNames[1:10]

calcRPKMsFromBAMs(bedFile,bamFiles , sampleNames, rpkmDir,8)

vcfFiles <- dir("/home/ricardo/Downloads/DMA/ATLAS", "vcf.bz2$", recursive = T, include.dirs = FALSE) vcfFids <- sapply(strsplit(vcfFiles,"[/\.]"), function(x){x[1]}) vcfPaths <- paste0("/home/ricardo/Downloads/DMA/ATLAS/",vcfFiles)

is_cmg <- FALSE # only for CMG project - otherwhise use FALSE lowRPKMthreshold <- 0.65# RPKM threshold maxFrequency <- 0.05 # max frequncy of HMZ deletion; default =0.005 minAOHsize <- 1000 # min AOH size minAOHsig <- 0.45 # min AOH signal threshold mc.cores<-8 # number of cores vR_id<-"AD" # ID from VCF FORMAT indicating the number of variant reads, for other variant callers could be "AD" tR_id<-"DP" # ID from VCF FORMAT indicating the number total reads filter <- "." # for other variant callers be '.'

results <- runHMZDelFinder (vcfPaths, # vcfPaths - paths to VCF files for AOH analysis (not used for 1000 genomes) vcfFids, # vcfFids - sample identifiers corresponding to VCF files (not used for 1000 genomes) rpkmPaths, # paths to RPKM files rpkmFids, # samples identifiers corresponding to RPKM files mc.cores, # number of CPU cores aohRDataOut,# temp file to store AOH data bedFile, # bed file with target lowRPKMthreshold, # RPKM threshold minAOHsize, # min AOH size minAOHsig, # min AOH signal threshold is_cmg, # flag used for CMG specific annotations; TRUE samples are from BHCMG cohort, FALSE otherwhise vR_id, # ID for 'the number of variants reads' in VCF FORMAT column (default='VR'); tR_id, # ID for 'the number of total reads' in VCF FORMAT column (default='DP') filter)

On Wed, Feb 28, 2018 at 8:53 AM, tgambin notifications@github.com wrote:

I found the problem and it supposed to be fixed right now (it was related to the way I parsed allelic depths). Please, try again and remember to set: vR_id<-"AD" tR_id<-"DP" filter <- "."

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BCM-Lupskilab/HMZDelFinder/issues/7#issuecomment-369245566, or mute the thread https://github.com/notifications/unsubscribe-auth/AHL3I8DEhstBk8eY-2Wk7Va5couSEkdKks5tZVpcgaJpZM4SOhe9 .

ricardoharripaul commented 6 years ago

Dear Tomek,

I wondered if you received my last email.

I had an error causing the script to fail after the AOH was successfully processed:

| | | | \/ | / \ | | () | | _ | || | |\/| | / /| | | |/ \ | | | | ' \ / ` |/ \ '| | | | | |/ /| |_| | / | | | | | | | (| | / | || ||| |/|/ _||| ||| ||_,|\|_|

[1] "[step 1 out of 7] AOH data " [1] "Reading VCFs and generating AOH data using CBS ..."	======================================================================	100% [1] "Extending AOHs by length of uncertain regions ..."	======================================================================	100% [1] "[step 2 out of 7] Preparing RPKM data " [1] "Reading RPKM files ..."

0%Error in if (file.info(file)$size == 0) { : missing value where TRUE/FALSE needed

Cheers,

Ricardo

On Fri, Mar 2, 2018 at 2:41 PM, Ricardo Harripaul < ricardo.harripaul@gmail.com> wrote:

Thanks a lot Tomek,

I made the changes you suggested but I am still getting some errors. I pasted the code I used below:

| | | | \/ | / \ | | () | | _ | || | |\/| | / /| | | |/ \ | | | | ' \ / ` |/ \ '| | | | | |/ /| |_| | / | | | | | | | (| | / | || ||| |/|/ _||| ||| ||_,|\|_|

[1] "[step 1 out of 7] ** AOH data **" [1] "Reading VCFs and generating AOH data using CBS ..." ====================================================================== 100% [1] "Extending AOHs by length of uncertain regions ..." ====================================================================== 100% [1] "[step 2 out of 7] ** Preparing RPKM data **" [1] "Reading RPKM files ..."

0%Error in if (file.info(file)$size == 0) { :

missing value where TRUE/FALSE needed

setwd("/home/ricardo/Downloads/HMZDelFinder-master/") workDir <- getwd() mainDir <- paste0(workDir ,"/HMZDelFinder/"); if (!file.exists(mainDir)){dir.create(mainDir)} dataDir <- paste0(mainDir, "data/" , sep=""); if (!file.exists(dataDir)){dir.create(dataDir)} # data directory

library(RCurl) library(data.table) library(gdata) library(parallel) library(Hmisc) library(matrixStats) library(DNAcopy) library(GenomicRanges) library(Rsubread)

eval( expr = parse( text = getURL("https://raw.githubusercontent.com/BCM- Lupskilab/HMZDelFinder/master/src/HMZDelFinder.R") ))

bedFile <- paste0(dataDir, "tgp_hg19.bed") # set path to BED file outputDir <- paste0(mainDir, "out/" , sep=""); if (!file.exists(outputDir)){dir.create(outputDir)} # create output directory plotsDir <- paste0(mainDir, "plots/" , sep=""); if (!file.exists(plotsDir)){dir.create(plotsDir)} # create output plots directory rpkmFiles <- dir(paste(dataDir, "TGP/",sep=""), "rpkm2.txt$") # list of RPKM file names rpkmFids <- gsub(".rpkm2.txt", "", rpkmFiles) # list of sample identifiers rpkmPaths <- paste0(paste0(dataDir, "TGP/"), rpkmFiles) # list of paths to RPKM files aohDir <- paste0(mainDir, "AOH/" , sep=""); if (!file.exists(aohDir)){dir. create(aohDir)} aohRDataOut <- paste(mainDir, "AOH/extAOH_small.RData", sep="") # temprary file to store AOH data

rpkmFiles <- rpkmFiles[1:10] rpkmFids<- rpkmFids[1:10] rpkmPaths <- rpkmPaths[1:10] pathToBams <- "/home/ricardo/Downloads/DMA/" rpkmDir <- "/home/ricardo/Downloads/HMZDelFinder-master/ HMZDelFinder/data/TGP" sampleNames <- sapply(strsplit(dir(pathToBams, "bam$"), "[/\.]"), function(x){x[length(x)-1]}) # sample identifiers sampleNames <- sampleNames[1:10]

calcRPKMsFromBAMs(bedFile,bamFiles , sampleNames, rpkmDir,8)

vcfFiles <- dir("/home/ricardo/Downloads/DMA/ATLAS", "vcf.bz2$", recursive = T, include.dirs = FALSE) vcfFids <- sapply(strsplit(vcfFiles,"[/\.]"), function(x){x[1]}) vcfPaths <- paste0("/home/ricardo/Downloads/DMA/ATLAS/",vcfFiles)

is_cmg <- FALSE # only for CMG project - otherwhise use FALSE lowRPKMthreshold <- 0.65# RPKM threshold maxFrequency <- 0.05 # max frequncy of HMZ deletion; default =0.005 minAOHsize <- 1000 # min AOH size minAOHsig <- 0.45 # min AOH signal threshold mc.cores<-8 # number of cores vR_id<-"AD" # ID from VCF FORMAT indicating the number of variant reads, for other variant callers could be "AD" tR_id<-"DP" # ID from VCF FORMAT indicating the number total reads filter <- "." # for other variant callers be '.'

results <- runHMZDelFinder (vcfPaths, # vcfPaths - paths to VCF files for AOH analysis (not used for 1000 genomes) vcfFids, # vcfFids - sample identifiers corresponding to VCF files (not used for 1000 genomes) rpkmPaths, # paths to RPKM files rpkmFids, # samples identifiers corresponding to RPKM files mc.cores, # number of CPU cores aohRDataOut,# temp file to store AOH data bedFile, # bed file with target lowRPKMthreshold, # RPKM threshold minAOHsize, # min AOH size minAOHsig, # min AOH signal threshold is_cmg, # flag used for CMG specific annotations; TRUE samples are from BHCMG cohort, FALSE otherwhise vR_id, # ID for 'the number of variants reads' in VCF FORMAT column (default='VR'); tR_id, # ID for 'the number of total reads' in VCF FORMAT column (default='DP') filter)

On Wed, Feb 28, 2018 at 8:53 AM, tgambin notifications@github.com wrote:

I found the problem and it supposed to be fixed right now (it was related to the way I parsed allelic depths). Please, try again and remember to set: vR_id<-"AD" tR_id<-"DP" filter <- "."

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BCM-Lupskilab/HMZDelFinder/issues/7#issuecomment-369245566, or mute the thread https://github.com/notifications/unsubscribe-auth/AHL3I8DEhstBk8eY-2Wk7Va5couSEkdKks5tZVpcgaJpZM4SOhe9 .

[1] "[step 1 out of 7] AOH data " [1] "Reading VCFs and generating AOH data using CBS ..."	======================================================================	100% [1] "Extending AOHs by length of uncertain regions ..."	======================================================================	100% [1] "[step 2 out of 7] Preparing RPKM data " [1] "Reading RPKM files ..."
0%Error in if (file.info(file)$size == 0) { :

tgambin commented 6 years ago

Can you check if rpkm files were generated?

ricardoharripaul commented 6 years ago

Hi,

So I can tell you that the rpkm2.txt files were definitely generated and they are in the same format as the example data.

The error still persists after attempting to redo the analysis. These are the same files that work without AOH filtering. Do you have any other suggestions?

Thanks,

Ricardo

[1] "[step 1 out of 7] AOH data " [1] "Reading VCFs and generating AOH data using CBS ..."	======================================================================	100% [1] "Extending AOHs by length of uncertain regions ..."	======================================================================	100% [1] "[step 2 out of 7] Preparing RPKM data " [1] "Reading RPKM files ..."

Error in if (file.info(file)$size == 0) { :

missing value where TRUE/FALSE needed

ricardo@ricardo-G750JW[TGP] head NA11843_rpkm2.txt [ 2:31PM] count RPKM 0 0 0 0 0 0 0 0 113 15.0938179577114 19 1.00955082218497 113 21.3089194697102 223 17.2956281136893 25 2.70148874653281 ricardo@ricardo-G750JW[TGP] head DMA-1_S1_rpkm2.txt [ 2:31PM] count RPKM 5 0.519272412253909 30 3.11563447352345 45 4.67345171028518 7 0.456370012788451 95 27.2690137601671 42 4.79566720620793 39 15.8042085941278 107 17.8336572163266 78 18.1126885011352

On Wed, Mar 7, 2018 at 10:06 AM, tgambin notifications@github.com wrote:

Can you check if rpkm files were generated?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BCM-Lupskilab/HMZDelFinder/issues/7#issuecomment-371165030, or mute the thread https://github.com/notifications/unsubscribe-auth/AHL3Iz6XEQnb1mYJ-VFFQ4AtcmK9NslBks5tb_V3gaJpZM4SOhe9 .

tgambin commented 6 years ago

Could you please check how the object "selectedFidsIdx" looks like just before running prepareRPKMData? Are there any elements ?

ricardoharripaul commented 6 years ago

Hi,

I am not sure how to access that function. Typing it in says it is not found.

selectedFidsIdxError: object 'selectedFidsIdx' not found> selectedFidsIdx()Error in selectedFidsIdx() : could not find function "selectedFidsIdx"

I looked up the Fids for the vcf, bam and rpkm Fids:

vcfFids [1] "DMA_1" "DMA-1_S1" "DMA_2" "DMA-2_S1" "DMA_3" "DMA-3_S1" "DMA-4_S1" "DMA-5_S1" "DMA-6_S1" "DMA-7_S1"> rpkmFids [1] "10018" "10019" "10020" "DMA-1_S1" "DMA-2_S1" "DMA-3_S1" "DMA-4_S1" "DMA-5_S1" "DMA-6_S1" "DMA-7_S1"> bamFiles [1] "/home/ricardo/Downloads/DMA/DMA_1.10019.bam" "/home/ricardo/Downloads/DMA/DMA-1_S1.bam" [3] "/home/ricardo/Downloads/DMA/DMA_2.10018.bam" "/home/ricardo/Downloads/DMA/DMA-2_S1.bam" [5] "/home/ricardo/Downloads/DMA/DMA_3.10020.bam" "/home/ricardo/Downloads/DMA/DMA-3_S1.bam" [7] "/home/ricardo/Downloads/DMA/DMA-4_S1.bam" "/home/ricardo/Downloads/DMA/DMA-5_S1.bam" [9] "/home/ricardo/Downloads/DMA/DMA-6_S1.bam" "/home/ricardo/Downloads/DMA/DMA-7_S1.bam"

On Thu, Mar 15, 2018 at 1:10 PM, tgambin notifications@github.com wrote:

Could you please check how the object "selectedFidsIdx" looks like just before running prepareRPKMData? Are there any elements ?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BCM-Lupskilab/HMZDelFinder/issues/7#issuecomment-373452803, or mute the thread https://github.com/notifications/unsubscribe-auth/AHL3IyXN8yGz-LpueFVlEcPOII2PhLD2ks5teqCTgaJpZM4SOhe9 .

tgambin commented 6 years ago

It looks like vcfFids (prefix of file name) and rpkmFids (sufix of file name) are not matching and this may cause the problem. Could you please, try to change your code so vcfFids and rpkmFids follow the same convention ?

ricardoharripaul commented 6 years ago

Hi Tomek,

I do not think that is the problem. I have made the IDs the same and I am still getting the same error.

rpkmFiles [1] "DMA_1_rpkm2.txt" "DMA-1_S1_rpkm2.txt" "DMA_2_rpkm2.txt" [4] "DMA-2_S1_rpkm2.txt" "DMA_3_rpkm2.txt" "DMA-3_S1_rpkm2.txt" [7] "DMA-4_S1_rpkm2.txt" "DMA-5_S1_rpkm2.txt" "DMA-6_S1_rpkm2.txt" [10] "DMA-7_S1_rpkm2.txt" rpkmFids <- gsub(".rpkm2.txt", "", rpkmFiles) rpkmFids [1] "DMA_1" "DMA-1_S1" "DMA_2" "DMA-2_S1" "DMA_3" "DMA-3_S1" [7] "DMA-4_S1" "DMA-5_S1" "DMA-6_S1" "DMA-7_S1" sampleNames [1] "DMA_1" "DMA-1_S1" "DMA_2" "DMA-2_S1" "DMA_3" "DMA-3_S1" [7] "DMA-4_S1" "DMA-5_S1" "DMA-6_S1" "DMA-7_S1" rpkmPaths <- paste0(dataDir, rpkmFiles) rpkmPaths [1] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/DMA_1_rpkm2.txt" [2] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/DMA-1_S1_rpkm2.txt" [3] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/DMA_2_rpkm2.txt" [4] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/DMA-2_S1_rpkm2.txt" [5] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/DMA_3_rpkm2.txt" [6] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/DMA-3_S1_rpkm2.txt" [7] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/DMA-4_S1_rpkm2.txt" [8] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/DMA-5_S1_rpkm2.txt" [9] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/DMA-6_S1_rpkm2.txt" [10] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/DMA-7_S1_rpkm2.txt" vcfFids [1] "DMA_1" "DMA-1_S1" "DMA_2" "DMA-2_S1" "DMA_3" "DMA-3_S1" [7] "DMA-4_S1" "DMA-5_S1" "DMA-6_S1" "DMA-7_S1" rpkmPaths [1] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/DMA_1_rpkm2.txt" [2] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/DMA-1_S1_rpkm2.txt" [3] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/DMA_2_rpkm2.txt" [4] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/DMA-2_S1_rpkm2.txt" [5] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/DMA_3_rpkm2.txt" [6] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/DMA-3_S1_rpkm2.txt" [7] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/DMA-4_S1_rpkm2.txt" [8] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/DMA-5_S1_rpkm2.txt" [9] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/DMA-6_S1_rpkm2.txt" [10] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/DMA-7_S1_rpkm2.txt" results <- runHMZDelFinder (vcfPaths, snpPaths= mc.cores= minAOHsize= tR_id= snpFids= aohRDataOut= minAOHsig= filter= rpkmPaths= bedFile= is_cmg= rpkmFids= lowRPKMthreshold= vR_id= results <- runHMZDelFinder (vcfPaths,# vcfPaths - paths to VCF files for AOH analysis (not used for 1000 genomes)

vcfFids,# vcfFids - sample identifiers corresponding to VCF files (not used for 1000 genomes)

rpkmPaths, # paths to RPKM files

rpkmFids,# samples identifiers corresponding to RPKM files

mc.cores,# number of CPU cores

aohRDataOut,# temp file to store AOH data

bedFile,# bed file with target

lowRPKMthreshold, # RPKM threshold

minAOHsize, # min AOH size

minAOHsig,# min AOH signal threshold

is_cmg,# flag used for CMG specific annotations; TRUE samples are from BHCMG cohort, FALSE otherwhise

vR_id, # ID for 'the number of variants reads' in VCF FORMAT column (default='VR');

tR_id,# ID for 'the number of total reads' in VCF FORMAT column (default='DP')

filter)

| | | | \/ | / \ | | () | | _ | || | |\/| | / /| | | |/ \ | | | | ' \ / ` |/ \ '| | | | | |/ /| |_| | / | | | | | | | (| | / | || ||| |/|/ _||| ||| ||_,|\|_|

[1] "[step 1 out of 7] AOH data " [1] "Reading VCFs and generating AOH data using CBS ..."	======================================================================	100% [1] "Extending AOHs by length of uncertain regions ..."	======================================================================	100% [1] "[step 2 out of 7] Preparing RPKM data " [1] "Reading RPKM files ..."

    Error in if (file.info(file)$size == 0) { :

missing value where TRUE/FALSE needed

On Sun, Mar 25, 2018 at 1:37 PM, tgambin notifications@github.com wrote:

It looks like vcfFids (prefix of file name) and rpkmFids (sufix of file name) are not matching and this may cause the problem. Could you please, try to change your code so vcfFids and rpkmFids follow the same convention ?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BCM-Lupskilab/HMZDelFinder/issues/7#issuecomment-375988521, or mute the thread https://github.com/notifications/unsubscribe-auth/AHL3IyVbq7vcyCF3BLtxdfvVdsSMeu72ks5th9XrgaJpZM4SOhe9 .

tgambin commented 6 years ago

Ok, Instead of executing "runHMZDelFinder ()", could you please try to run the function body line by line, i.e.

library(gdata)
library(data.table)
library(GenomicRanges)
library(parallel)
library(matrixStats)

extAOH <- NULL
if (is.null(snpPaths) || is.null(snpFids)){print("Skipping AOH analysis ...")}
else{
        library(DNAcopy)
        extAOH <- prepareAOHData(snpPaths, snpFids, mc.cores, vR_id, tR_id, filter)
        save(extAOH, file=aohRDataOut)
}

print("[step 2 out of 7] ****** Preparing RPKM data ******")
selectedFidsIdx <- 1:length(rpkmFids)
if (!is.null(extAOH)) {
    selectedFidsIdx <- which(rpkmFids %in% extAOH$Name)
}

rpkmDt <- prepareRPKMData(rpkmPaths[selectedFidsIdx], rpkmFids[selectedFidsIdx], 1)

   ## please also print length(selectedFidsIdx)
 print(length(selectedFidsIdx))

ricardoharripaul commented 6 years ago

Hi Tomek,

It is producing the same error message. The error has been reproduced here and it seems the extAOH variable is empty from the prepareAOHData function. The paths are correct on the commandline but it seems to not be parsing out information. Please see below for the error and full message.

extAOH Empty data.table (0 rows) of 10 cols: ID,chrom,loc.start,loc.end,num.mark,seg.mean...

is_cmg <- FALSE # only for CMG project - otherwhise use FALSE lowRPKMthreshold <- 0.65# RPKM threshold maxFrequency <- 0.05# max frequncy of HMZ deletion; default =0.005 minAOHsize <- 1000# min AOH size minAOHsig <- 0.45# min AOH signal threshold mc.cores<-8 # number of cores vR_id<-"AD"# ID from VCF FORMAT indicating the number of variant reads, for other variant callers could be "AD" tR_id<-"DP"# ID from VCF FORMAT indicating the number total reads filter <- "."# for other variant callers be '.' getwd() [1] "/home/ricardo/Downloads/DMA" rpkmFiles [1] "DMA_1_rpkm2.txt" "DMA-1_S1_rpkm2.txt" "DMA_2_rpkm2.txt" [4] "DMA-2_S1_rpkm2.txt" "DMA_3_rpkm2.txt" "DMA-3_S1_rpkm2.txt" [7] "DMA-4_S1_rpkm2.txt" "DMA-5_S1_rpkm2.txt" "DMA-6_S1_rpkm2.txt" [10] "DMA-7_S1_rpkm2.txt" rpkmFids [1] "DMA_1" "DMA-1_S1" "DMA_2" "DMA-2_S1" "DMA_3" "DMA-3_S1" [7] "DMA-4_S1" "DMA-5_S1" "DMA-6_S1" "DMA-7_S1" rpkmPaths [1] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA_1_rpkm2.txt" [2] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-1_S1_rpkm2.txt" [3] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA_2_rpkm2.txt" [4] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-2_S1_rpkm2.txt" [5] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA_3_rpkm2.txt" [6] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-3_S1_rpkm2.txt" [7] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-4_S1_rpkm2.txt" [8] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-5_S1_rpkm2.txt" [9] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-6_S1_rpkm2.txt" [10] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-7_S1_rpkm2.txt" bamFiles [1] "/home/ricardo/Downloads/DMA/DMA_1.10019.bam" [2] "/home/ricardo/Downloads/DMA/DMA-1_S1.bam" [3] "/home/ricardo/Downloads/DMA/DMA_2.10018.bam" [4] "/home/ricardo/Downloads/DMA/DMA-2_S1.bam" [5] "/home/ricardo/Downloads/DMA/DMA_3.10020.bam" [6] "/home/ricardo/Downloads/DMA/DMA-3_S1.bam" [7] "/home/ricardo/Downloads/DMA/DMA-4_S1.bam" [8] "/home/ricardo/Downloads/DMA/DMA-5_S1.bam" [9] "/home/ricardo/Downloads/DMA/DMA-6_S1.bam" [10] "/home/ricardo/Downloads/DMA/DMA-7_S1.bam" sampleNames [1] "10019" "DMA-1_S1" "10018" "DMA-2_S1" "10020" "DMA-3_S1" [7] "DMA-4_S1" "DMA-5_S1" "DMA-6_S1" "DMA-7_S1" sampleNames<-rpkmFids sampleNames [1] "DMA_1" "DMA-1_S1" "DMA_2" "DMA-2_S1" "DMA_3" "DMA-3_S1" [7] "DMA-4_S1" "DMA-5_S1" "DMA-6_S1" "DMA-7_S1" vcfFiles [1] "DMA_1.10019.vcf.bz2" "DMA-1_S1.vcf.bz2" "DMA_2.10018.vcf.bz2" [4] "DMA-2_S1.vcf.bz2" "DMA_3.10020.vcf.bz2" "DMA-3_S1.vcf.bz2" [7] "DMA-4_S1.vcf.bz2" "DMA-5_S1.vcf.bz2" "DMA-6_S1.vcf.bz2" [10] "DMA-7_S1.vcf.bz2" vcfFids [1] "DMA_1" "DMA-1_S1" "DMA_2" "DMA-2_S1" "DMA_3" "DMA-3_S1" [7] "DMA-4_S1" "DMA-5_S1" "DMA-6_S1" "DMA-7_S1" vcfPaths [1] "/home/ricardo/Downloads/DMA/ATLAS/DMA_1.10019.vcf.bz2" [2] "/home/ricardo/Downloads/DMA/ATLAS/DMA-1_S1.vcf.bz2" [3] "/home/ricardo/Downloads/DMA/ATLAS/DMA_2.10018.vcf.bz2" [4] "/home/ricardo/Downloads/DMA/ATLAS/DMA-2_S1.vcf.bz2" [5] "/home/ricardo/Downloads/DMA/ATLAS/DMA_3.10020.vcf.bz2" [6] "/home/ricardo/Downloads/DMA/ATLAS/DMA-3_S1.vcf.bz2" [7] "/home/ricardo/Downloads/DMA/ATLAS/DMA-4_S1.vcf.bz2" [8] "/home/ricardo/Downloads/DMA/ATLAS/DMA-5_S1.vcf.bz2" [9] "/home/ricardo/Downloads/DMA/ATLAS/DMA-6_S1.vcf.bz2" [10] "/home/ricardo/Downloads/DMA/ATLAS/DMA-7_S1.vcf.bz2" library(gdata) library(data.table) library(GenomicRanges) library(parallel) library(matrixStats)

extAOH <- NULL if (is.null(vcfPaths) || is.null(vcfFids)){print("Skipping AOH analysis ...")

} else{

library(DNAcopy)

extAOH <- prepareAOHData(vcfPaths, vcfFids, mc.cores, vR_id, tR_id, filter)

save(extAOH, file=aohRDataOut)

} [1] "Reading VCFs and generating AOH data using CBS ..." |======================================================================| 100% [1] "Extending AOHs by length of uncertain regions ..." |======================================================================| 100% extAOH Empty data.table (0 rows) of 10 cols: ID,chrom,loc.start,loc.end,num.mark,seg.mean... print("[step 2 out of 7] ** Preparing RPKM data **") [1] "[step 2 out of 7] ** Preparing RPKM data **" selectedFidsIdx <- 1:length(rpkmFids) if (!is.null(extAOH)) {

selectedFidsIdx <- which(rpkmFids %in% extAOH$Name)

}

rpkmDt <- prepareRPKMData(rpkmPaths[selectedFidsIdx], rpkmFids[selectedFidsIdx], 1) [1] "Reading RPKM files ..." Error in if (file.info(file)$size == 0) { : missing value where TRUE/FALSE needed

0%^C extAOH Empty data.table (0 rows) of 10 cols: ID,chrom,loc.start,loc.end,num.mark,seg.mean...

On Sat, Apr 7, 2018 at 5:54 AM, tgambin notifications@github.com wrote:

Ok, Instead of executing "runHMZDelFinder ()", could you please try to run the function body line by line, i.e.

library(gdata) library(data.table) library(GenomicRanges) library(parallel) library(matrixStats)

extAOH <- NULL if (is.null(snpPaths) || is.null(snpFids)){print("Skipping AOH analysis ...")} else{ library(DNAcopy) extAOH <- prepareAOHData(snpPaths, snpFids, mc.cores, vR_id, tR_id, filter) save(extAOH, file=aohRDataOut) }

print("[step 2 out of 7] ** Preparing RPKM data **") selectedFidsIdx <- 1:length(rpkmFids) if (!is.null(extAOH)) { selectedFidsIdx <- which(rpkmFids %in% extAOH$Name) }

rpkmDt <- prepareRPKMData(rpkmPaths[selectedFidsIdx], rpkmFids[selectedFidsIdx], 1)

please also print length(selectedFidsIdx)

print(length(selectedFidsIdx))

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BCM-Lupskilab/HMZDelFinder/issues/7#issuecomment-379457838, or mute the thread https://github.com/notifications/unsubscribe-auth/AHL3I_O4oui9WPXGQMY1af2dmkocXgvEks5tmIzsgaJpZM4SOhe9 .

ricardoharripaul commented 6 years ago

Hi Tomek,

Just following up with you.

Cheers,

Ricardo

On Mon, Apr 16, 2018 at 4:32 PM, Ricardo Harripaul < ricardo.harripaul@gmail.com> wrote:

Hi Tomek,

It is producing the same error message. The error has been reproduced here and it seems the extAOH variable is empty from the prepareAOHData function. The paths are correct on the commandline but it seems to not be parsing out information. Please see below for the error and full message.

extAOH Empty data.table (0 rows) of 10 cols: ID,chrom,loc.start,loc.end, num.mark,seg.mean...

is_cmg <- FALSE # only for CMG project - otherwhise use FALSE lowRPKMthreshold <- 0.65# RPKM threshold maxFrequency <- 0.05# max frequncy of HMZ deletion; default =0.005 minAOHsize <- 1000# min AOH size minAOHsig <- 0.45# min AOH signal threshold mc.cores<-8 # number of cores vR_id<-"AD"# ID from VCF FORMAT indicating the number of variant reads, for other variant callers could be "AD" tR_id<-"DP"# ID from VCF FORMAT indicating the number total reads filter <- "."# for other variant callers be '.' getwd() [1] "/home/ricardo/Downloads/DMA" rpkmFiles [1] "DMA_1_rpkm2.txt" "DMA-1_S1_rpkm2.txt" "DMA_2_rpkm2.txt" [4] "DMA-2_S1_rpkm2.txt" "DMA_3_rpkm2.txt" "DMA-3_S1_rpkm2.txt" [7] "DMA-4_S1_rpkm2.txt" "DMA-5_S1_rpkm2.txt" "DMA-6_S1_rpkm2.txt" [10] "DMA-7_S1_rpkm2.txt" rpkmFids [1] "DMA_1" "DMA-1_S1" "DMA_2" "DMA-2_S1" "DMA_3" "DMA-3_S1" [7] "DMA-4_S1" "DMA-5_S1" "DMA-6_S1" "DMA-7_S1" rpkmPaths [1] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA_1_rpkm2.txt"

[2] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-1_ S1_rpkm2.txt" [3] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA_2_rpkm2.txt"

[4] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-2_ S1_rpkm2.txt" [5] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA_3_rpkm2.txt"

[6] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-3_ S1rpkm2.txt" [7] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-4 S1rpkm2.txt" [8] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-5 S1rpkm2.txt" [9] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-6 S1rpkm2.txt" [10] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-7 S1_rpkm2.txt"

bamFiles [1] "/home/ricardo/Downloads/DMA/DMA_1.10019.bam" [2] "/home/ricardo/Downloads/DMA/DMA-1_S1.bam" [3] "/home/ricardo/Downloads/DMA/DMA_2.10018.bam" [4] "/home/ricardo/Downloads/DMA/DMA-2_S1.bam" [5] "/home/ricardo/Downloads/DMA/DMA_3.10020.bam" [6] "/home/ricardo/Downloads/DMA/DMA-3_S1.bam" [7] "/home/ricardo/Downloads/DMA/DMA-4_S1.bam" [8] "/home/ricardo/Downloads/DMA/DMA-5_S1.bam" [9] "/home/ricardo/Downloads/DMA/DMA-6_S1.bam" [10] "/home/ricardo/Downloads/DMA/DMA-7_S1.bam" sampleNames [1] "10019" "DMA-1_S1" "10018" "DMA-2_S1" "10020" "DMA-3_S1" [7] "DMA-4_S1" "DMA-5_S1" "DMA-6_S1" "DMA-7_S1" sampleNames<-rpkmFids sampleNames [1] "DMA_1" "DMA-1_S1" "DMA_2" "DMA-2_S1" "DMA_3" "DMA-3_S1" [7] "DMA-4_S1" "DMA-5_S1" "DMA-6_S1" "DMA-7_S1" vcfFiles [1] "DMA_1.10019.vcf.bz2" "DMA-1_S1.vcf.bz2" "DMA_2.10018.vcf.bz2" [4] "DMA-2_S1.vcf.bz2" "DMA_3.10020.vcf.bz2" "DMA-3_S1.vcf.bz2" [7] "DMA-4_S1.vcf.bz2" "DMA-5_S1.vcf.bz2" "DMA-6_S1.vcf.bz2" [10] "DMA-7_S1.vcf.bz2" vcfFids [1] "DMA_1" "DMA-1_S1" "DMA_2" "DMA-2_S1" "DMA_3" "DMA-3_S1" [7] "DMA-4_S1" "DMA-5_S1" "DMA-6_S1" "DMA-7_S1" vcfPaths [1] "/home/ricardo/Downloads/DMA/ATLAS/DMA_1.10019.vcf.bz2" [2] "/home/ricardo/Downloads/DMA/ATLAS/DMA-1_S1.vcf.bz2" [3] "/home/ricardo/Downloads/DMA/ATLAS/DMA_2.10018.vcf.bz2" [4] "/home/ricardo/Downloads/DMA/ATLAS/DMA-2_S1.vcf.bz2" [5] "/home/ricardo/Downloads/DMA/ATLAS/DMA_3.10020.vcf.bz2" [6] "/home/ricardo/Downloads/DMA/ATLAS/DMA-3_S1.vcf.bz2" [7] "/home/ricardo/Downloads/DMA/ATLAS/DMA-4_S1.vcf.bz2" [8] "/home/ricardo/Downloads/DMA/ATLAS/DMA-5_S1.vcf.bz2" [9] "/home/ricardo/Downloads/DMA/ATLAS/DMA-6_S1.vcf.bz2" [10] "/home/ricardo/Downloads/DMA/ATLAS/DMA-7_S1.vcf.bz2" library(gdata) library(data.table) library(GenomicRanges) library(parallel) library(matrixStats)

extAOH <- NULL if (is.null(vcfPaths) || is.null(vcfFids)){print("Skipping AOH analysis ...")

} else{

library(DNAcopy)

extAOH <- prepareAOHData(vcfPaths, vcfFids, mc.cores, vR_id, tR_id, filter)

save(extAOH, file=aohRDataOut)

} [1] "Reading VCFs and generating AOH data using CBS ..." |======================================================================| 100% [1] "Extending AOHs by length of uncertain regions ..." |======================================================================| 100% extAOH Empty data.table (0 rows) of 10 cols: ID,chrom,loc.start,loc.end, num.mark,seg.mean... print("[step 2 out of 7] ** Preparing RPKM data **") [1] "[step 2 out of 7] ** Preparing RPKM data **" selectedFidsIdx <- 1:length(rpkmFids) if (!is.null(extAOH)) {

selectedFidsIdx <- which(rpkmFids %in% extAOH$Name)

}

rpkmDt <- prepareRPKMData(rpkmPaths[selectedFidsIdx], rpkmFids[selectedFidsIdx], 1) [1] "Reading RPKM files ..." Error in if (file.info(file)$size == 0) { : missing value where TRUE/FALSE needed

0%^C

extAOH Empty data.table (0 rows) of 10 cols: ID,chrom,loc.start,loc.end, num.mark,seg.mean...

On Sat, Apr 7, 2018 at 5:54 AM, tgambin notifications@github.com wrote:

Ok, Instead of executing "runHMZDelFinder ()", could you please try to run the function body line by line, i.e.

library(gdata) library(data.table) library(GenomicRanges) library(parallel) library(matrixStats)

extAOH <- NULL if (is.null(snpPaths) || is.null(snpFids)){print("Skipping AOH analysis ...")} else{ library(DNAcopy) extAOH <- prepareAOHData(snpPaths, snpFids, mc.cores, vR_id, tR_id, filter) save(extAOH, file=aohRDataOut) }

print("[step 2 out of 7] ** Preparing RPKM data **") selectedFidsIdx <- 1:length(rpkmFids) if (!is.null(extAOH)) { selectedFidsIdx <- which(rpkmFids %in% extAOH$Name) }

rpkmDt <- prepareRPKMData(rpkmPaths[selectedFidsIdx], rpkmFids[selectedFidsIdx], 1)

please also print length(selectedFidsIdx)

print(length(selectedFidsIdx))

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BCM-Lupskilab/HMZDelFinder/issues/7#issuecomment-379457838, or mute the thread https://github.com/notifications/unsubscribe-auth/AHL3I_O4oui9WPXGQMY1af2dmkocXgvEks5tmIzsgaJpZM4SOhe9 .

rpkmDt <- prepareRPKMData(rpkmPaths[selectedFidsIdx], rpkmFids[selectedFidsIdx], 1) [1] "Reading RPKM files ..."	Error in if (file.info(file)$size == 0) { : missing value where TRUE/FALSE needed
0%^C

tgambin commented 6 years ago

Could you please provide me a sample VCF file once again. I tried to download it but the link is inactive

ricardoharripaul commented 6 years ago

Thanks

https://drive.google.com/open?id=1V5Y7GELPWBlScP2GXyve_rJyeb_dJMLc

On Fri, Apr 27, 2018 at 5:31 AM, tgambin notifications@github.com wrote:

Could you please provide me a sample VCF file once again. I tried to download it but the link is inactive

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BCM-Lupskilab/HMZDelFinder/issues/7#issuecomment-384918302, or mute the thread https://github.com/notifications/unsubscribe-auth/AHL3I5Hnno_SGmTJ5k-Wa2USr-XeTi5Fks5tsuWEgaJpZM4SOhe9 .

ricardoharripaul commented 6 years ago

Hi Tomek,

Just following up.

Cheers,

Ricardo

On Fri, Apr 27, 2018 at 11:09 AM, Ricardo Harripaul < ricardo.harripaul@gmail.com> wrote:

Thanks

https://drive.google.com/open?id=1V5Y7GELPWBlScP2GXyve_rJyeb_dJMLc

On Fri, Apr 27, 2018 at 5:31 AM, tgambin notifications@github.com wrote:

Could you please provide me a sample VCF file once again. I tried to download it but the link is inactive

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BCM-Lupskilab/HMZDelFinder/issues/7#issuecomment-384918302, or mute the thread https://github.com/notifications/unsubscribe-auth/AHL3I5Hnno_SGmTJ5k-Wa2USr-XeTi5Fks5tsuWEgaJpZM4SOhe9 .

tgambin commented 6 years ago

Thanks for providing the data. I will try to process the data and get back to you. Tomek

ricardoharripaul commented 6 years ago

Hi Tomek,

Just checking in again.

Cheers,

Ricardo

On Wed, May 16, 2018 at 11:08 AM, tgambin notifications@github.com wrote:

Thanks for providing the data. I will try to process the data and get back to you. Tomek

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BCM-Lupskilab/HMZDelFinder/issues/7#issuecomment-389553610, or mute the thread https://github.com/notifications/unsubscribe-auth/AHL3Iye2SHAz3pJniCvXkih9ZP-Doy_uks5tzEDmgaJpZM4SOhe9 .

tgambin commented 6 years ago

I found one issue in the function read.vcf.quick.noinfo which is now fixed (there was a leading space between " bzcat" ). I think I did not catch before this problem because it worked find with the previous version od data.table package.

I tested it on your VCFs and it works now. Please try and let me know if you have any other problems.

ricardoharripaul commented 6 years ago

Hi Tomek,

I tried to start from scratch and downloaded a new copy of the HMZDelFinder. I am getting the same error though:

vcfFiles [1] "DMA_1.10019.vcf.bz2" "DMA_2.10018.vcf.bz2" "DMA_3.10020.vcf.bz2" [4] "DMA-1_S1.vcf.bz2" "DMA-2_S1.vcf.bz2" "DMA-3_S1.vcf.bz2" [7] "DMA-4_S1.vcf.bz2" "DMA-5_S1.vcf.bz2" "DMA-6_S1.vcf.bz2" [10] "DMA-7_S1.vcf.bz2" vcfFids [1] "DMA_1" "DMA_2" "DMA_3" "DMA-1_S1" "DMA-2_S1" "DMA-3_S1" [7] "DMA-4_S1" "DMA-5_S1" "DMA-6_S1" "DMA-7_S1" vcfPaths [1] "/home/ricardo/Downloads/DMA/ATLAS/DMA_1.10019.vcf.bz2" [2] "/home/ricardo/Downloads/DMA/ATLAS/DMA_2.10018.vcf.bz2" [3] "/home/ricardo/Downloads/DMA/ATLAS/DMA_3.10020.vcf.bz2" [4] "/home/ricardo/Downloads/DMA/ATLAS/DMA-1_S1.vcf.bz2" [5] "/home/ricardo/Downloads/DMA/ATLAS/DMA-2_S1.vcf.bz2" [6] "/home/ricardo/Downloads/DMA/ATLAS/DMA-3_S1.vcf.bz2" [7] "/home/ricardo/Downloads/DMA/ATLAS/DMA-4_S1.vcf.bz2" [8] "/home/ricardo/Downloads/DMA/ATLAS/DMA-5_S1.vcf.bz2" [9] "/home/ricardo/Downloads/DMA/ATLAS/DMA-6_S1.vcf.bz2" [10] "/home/ricardo/Downloads/DMA/ATLAS/DMA-7_S1.vcf.bz2" rpkmPaths [1] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA_1_rpkm2.txt" [2] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA_2_rpkm2.txt" [3] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA_3_rpkm2.txt" [4] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-1_S1_rpkm2.txt" [5] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-2_S1_rpkm2.txt" [6] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-3_S1_rpkm2.txt" [7] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-4_S1_rpkm2.txt" [8] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-5_S1_rpkm2.txt" [9] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-6_S1_rpkm2.txt" [10] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-7_S1_rpkm2.txt" rpkmFids [1] "DMA_1" "DMA_2" "DMA_3" "DMA-1_S1" "DMA-2_S1" "DMA-3_S1" [7] "DMA-4_S1" "DMA-5_S1" "DMA-6_S1" "DMA-7_S1" is_cmg <- FALSE # only for CMG project - otherwhise use FALSE lowRPKMthreshold <- 0.65# RPKM threshold maxFrequency <- 0.05# max frequncy of HMZ deletion; default =0.005 minAOHsize <- 1000# min AOH size minAOHsig <- 0.45# min AOH signal threshold mc.cores<-8 # number of cores vR_id<-"AD"# ID from VCF FORMAT indicating the number of variant reads, for other variant callers could be "AD" tR_id<-"DP"# ID from VCF FORMAT indicating the number total reads filter <- "." results <- runHMZDelFinder (vcfPaths, snpPaths= mc.cores= minAOHsize= tR_id= snpFids= aohRDataOut= minAOHsig= filter= rpkmPaths= bedFile= is_cmg= rpkmFids= lowRPKMthreshold= vR_id= results <- runHMZDelFinder (vcfPaths,# vcfPaths - paths to VCF files for AOH analysis (not used for 1000 genomes)

vcfFids,# vcfFids - sample identifiers corresponding to VCF files (not used for 1000 genomes)

rpkmPaths, # paths to RPKM files

rpkmFids,# samples identifiers corresponding to RPKM files

mc.cores,# number of CPU cores

aohRDataOut,# temp file to store AOH data

bedFile,# bed file with target

lowRPKMthreshold, # RPKM threshold

minAOHsize, # min AOH size

minAOHsig,# min AOH signal threshold

is_cmg,# flag used for CMG specific annotations; TRUE samples are from BHCMG cohort, FALSE otherwhise

vR_id, # ID for 'the number of variants reads' in VCF FORMAT column (default='VR');

tR_id,# ID for 'the number of total reads' in VCF FORMAT column (default='DP')

filter)

| | | | \/ | / \ | | () | | _ | || | |\/| | / /| | | |/ \ | | | | ' \ / ` |/ \ '| | | | | |/ /| |_| | / | | | | | | | (| | / | || ||| |/|/ _||| ||| ||_,|\|_|

[1] "[step 1 out of 7] AOH data " [1] "Reading VCFs and generating AOH data using CBS ..."	======================================================================	100% [1] "Extending AOHs by length of uncertain regions ..."	======================================================================	100% [1] "[step 2 out of 7] Preparing RPKM data " [1] "Reading RPKM files ..."

0%Error in if (file.info(file)$size == 0) { : missing value where TRUE/FALSE needed In addition: There were 50 or more warnings (use warnings() to see the first 50) | | 0%

On Fri, Jun 15, 2018 at 11:33 AM tgambin notifications@github.com wrote:

I found one issue in the function read.vcf.quick.noinfo which is now fixed (there was a leading space between " bzcat" ). I think I did not catch before this problem because it worked find with the previous version od data.table package.

I tested it on your VCFs and it works now. Please try and let me know if you have any other problems.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BCM-Lupskilab/HMZDelFinder/issues/7#issuecomment-397659119, or mute the thread https://github.com/notifications/unsubscribe-auth/AHL3I147lU8XTGHlaeTh_URgTc_wiC3tks5t89PegaJpZM4SOhe9 .

ricardoharripaul commented 6 years ago

Hi Tomek,

I was wondering if you received my last email. I thought the software was updated and I tried running the exome filtering and it crashed as before. Am I suppose to modify the code?

Thanks,

Ricardo

On Fri, Jun 29, 2018 at 2:08 PM Ricardo Harripaul < ricardo.harripaul@gmail.com> wrote:

Hi Tomek,

I tried to start from scratch and downloaded a new copy of the HMZDelFinder. I am getting the same error though:

vcfFiles [1] "DMA_1.10019.vcf.bz2" "DMA_2.10018.vcf.bz2" "DMA_3.10020.vcf.bz2" [4] "DMA-1_S1.vcf.bz2" "DMA-2_S1.vcf.bz2" "DMA-3_S1.vcf.bz2" [7] "DMA-4_S1.vcf.bz2" "DMA-5_S1.vcf.bz2" "DMA-6_S1.vcf.bz2" [10] "DMA-7_S1.vcf.bz2" vcfFids [1] "DMA_1" "DMA_2" "DMA_3" "DMA-1_S1" "DMA-2_S1" "DMA-3_S1" [7] "DMA-4_S1" "DMA-5_S1" "DMA-6_S1" "DMA-7_S1" vcfPaths [1] "/home/ricardo/Downloads/DMA/ATLAS/DMA_1.10019.vcf.bz2" [2] "/home/ricardo/Downloads/DMA/ATLAS/DMA_2.10018.vcf.bz2" [3] "/home/ricardo/Downloads/DMA/ATLAS/DMA_3.10020.vcf.bz2" [4] "/home/ricardo/Downloads/DMA/ATLAS/DMA-1_S1.vcf.bz2" [5] "/home/ricardo/Downloads/DMA/ATLAS/DMA-2_S1.vcf.bz2" [6] "/home/ricardo/Downloads/DMA/ATLAS/DMA-3_S1.vcf.bz2" [7] "/home/ricardo/Downloads/DMA/ATLAS/DMA-4_S1.vcf.bz2" [8] "/home/ricardo/Downloads/DMA/ATLAS/DMA-5_S1.vcf.bz2" [9] "/home/ricardo/Downloads/DMA/ATLAS/DMA-6_S1.vcf.bz2" [10] "/home/ricardo/Downloads/DMA/ATLAS/DMA-7_S1.vcf.bz2" rpkmPaths [1] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA_1_rpkm2.txt" [2] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA_2_rpkm2.txt" [3] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA_3_rpkm2.txt" [4] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-1_S1_rpkm2.txt" [5] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-2_S1_rpkm2.txt" [6] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-3_S1_rpkm2.txt" [7] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-4_S1_rpkm2.txt" [8] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-5_S1_rpkm2.txt" [9] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-6_S1_rpkm2.txt" [10] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-7_S1_rpkm2.txt" rpkmFids [1] "DMA_1" "DMA_2" "DMA_3" "DMA-1_S1" "DMA-2_S1" "DMA-3_S1" [7] "DMA-4_S1" "DMA-5_S1" "DMA-6_S1" "DMA-7_S1" is_cmg <- FALSE # only for CMG project - otherwhise use FALSE lowRPKMthreshold <- 0.65# RPKM threshold maxFrequency <- 0.05# max frequncy of HMZ deletion; default =0.005 minAOHsize <- 1000# min AOH size minAOHsig <- 0.45# min AOH signal threshold mc.cores<-8 # number of cores vR_id<-"AD"# ID from VCF FORMAT indicating the number of variant reads, for other variant callers could be "AD" tR_id<-"DP"# ID from VCF FORMAT indicating the number total reads filter <- "." results <- runHMZDelFinder (vcfPaths, snpPaths= mc.cores= minAOHsize= tR_id= snpFids= aohRDataOut= minAOHsig= filter= rpkmPaths= bedFile= is_cmg= rpkmFids= lowRPKMthreshold= vR_id= results <- runHMZDelFinder (vcfPaths,# vcfPaths - paths to VCF files for AOH analysis (not used for 1000 genomes)

vcfFids,# vcfFids - sample identifiers corresponding to VCF files (not used for 1000 genomes)

rpkmPaths, # paths to RPKM files

rpkmFids,# samples identifiers corresponding to RPKM files

mc.cores,# number of CPU cores

aohRDataOut,# temp file to store AOH data

bedFile,# bed file with target

lowRPKMthreshold, # RPKM threshold

minAOHsize, # min AOH size

minAOHsig,# min AOH signal threshold

is_cmg,# flag used for CMG specific annotations; TRUE samples are from BHCMG cohort, FALSE otherwhise

vR_id, # ID for 'the number of variants reads' in VCF FORMAT column (default='VR');

tR_id,# ID for 'the number of total reads' in VCF FORMAT column (default='DP')

filter)

| | | | \/ | / \ | | () | | _ | || | |\/| | / /| | | |/ \ | | | | ' \ / ` |/ \ '| | | | | |/ /| |_| | / | | | | | | | (| | / | || ||| |/|/ _||| ||| ||_,|\|_|

[1] "[step 1 out of 7] ** AOH data **" [1] "Reading VCFs and generating AOH data using CBS ..." ====================================================================== 100% [1] "Extending AOHs by length of uncertain regions ..." ====================================================================== 100% [1] "[step 2 out of 7] ** Preparing RPKM data **" [1] "Reading RPKM files ..."

0%Error in if (file.info(file)$size == 0) { :

missing value where TRUE/FALSE needed In addition: There were 50 or more warnings (use warnings() to see the first 50) | | 0%

On Fri, Jun 15, 2018 at 11:33 AM tgambin notifications@github.com wrote:

I found one issue in the function read.vcf.quick.noinfo which is now fixed (there was a leading space between " bzcat" ). I think I did not catch before this problem because it worked find with the previous version od data.table package.

I tested it on your VCFs and it works now. Please try and let me know if you have any other problems.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BCM-Lupskilab/HMZDelFinder/issues/7#issuecomment-397659119, or mute the thread https://github.com/notifications/unsubscribe-auth/AHL3I147lU8XTGHlaeTh_URgTc_wiC3tks5t89PegaJpZM4SOhe9 .

[1] "[step 1 out of 7] AOH data " [1] "Reading VCFs and generating AOH data using CBS ..."	======================================================================	100% [1] "Extending AOHs by length of uncertain regions ..."	======================================================================	100% [1] "[step 2 out of 7] Preparing RPKM data " [1] "Reading RPKM files ..."
0%Error in if (file.info(file)$size == 0) { :

tgambin commented 6 years ago

Could you please check extAOH object like you did it before ? Is it still empty ?

ricardoharripaul commented 6 years ago

Hi Tomek,

It seems it does not exist.

results <- runHMZDelFinder (vcfPaths,# vcfPaths - paths to VCF files for AOH analysis (not used for 1000 genomes)

vcfFids,# vcfFids - sample identifiers corresponding to VCF files (not used for 1000 genomes)

rpkmPaths, # paths to RPKM files

rpkmFids,# samples identifiers corresponding to RPKM files

mc.cores,# number of CPU cores

aohRDataOut,# temp file to store AOH data

bedFile,# bed file with target

lowRPKMthreshold, # RPKM threshold

minAOHsize, # min AOH size

minAOHsig,# min AOH signal threshold

is_cmg,# flag used for CMG specific annotations; TRUE samples are from BHCMG cohort, FALSE otherwhise

vR_id, # ID for 'the number of variants reads' in VCF FORMAT column (default='VR');

tR_id,# ID for 'the number of total reads' in VCF FORMAT column (default='DP')

filter)

| | | | \/ | / \ | | () | | _ | || | |\/| | / /| | | |/ \ | | | | ' \ / ` |/ \ '| | | | | |/ /| |_| | / | | | | | | | (| | / | || ||| |/|/ _||| ||| ||_,|\|_|

[1] "[step 1 out of 7] AOH data " [1] "Reading VCFs and generating AOH data using CBS ..."	======================================================================	100% [1] "Extending AOHs by length of uncertain regions ..."	======================================================================	100% [1] "[step 2 out of 7] Preparing RPKM data " [1] "Reading RPKM files ..."

0%Error in if (file.info(file)$size == 0) { : missing value where TRUE/FALSE needed In addition: There were 22 warnings (use warnings() to see them) ^C

library(gdata) library(data.table) library(GenomicRanges) library(parallel) library(matrixStats)

extAOH <- NULL if (is.null(snpPaths) || is.null(snpFids)){print("Skipping AOH analysis ...")} Error: object 'snpPaths' not found else{ Error: unexpected 'else' in "else" library(DNAcopy) extAOH <- prepareAOHData(snpPaths, snpFids, mc.cores, vR_id, tR_id, filter) [1] "Reading VCFs and generating AOH data using CBS ..." Error in mclapply2(1:length(fileNames), function(i) { : object 'snpPaths' not found save(extAOH, file=aohRDataOut) } Error: unexpected '}' in "}"

print("[step 2 out of 7] ** Preparing RPKM data **") [1] "[step 2 out of 7] ** Preparing RPKM data **" selectedFidsIdx <- 1:length(rpkmFids) if (!is.null(extAOH)) {

selectedFidsIdx <- which(rpkmFids %in% extAOH$Name)

}

rpkmDt <- prepareRPKMData(rpkmPaths[selectedFidsIdx], rpkmFids[selectedFidsIdx], 1) [1] "Reading RPKM files ..." |======================================================================| 100% [1] "Removing empty elements ..." [1] "Creating matrix ..." |======================================================================| 100% [1] "Creating data.table (may take a while)..." Warning messages: 1: In selectChildren(pids[!fin], -1) : cannot wait for child 19856 as it does not exist 2: In selectChildren(pids[!fin], -1) : cannot wait for child 19856 as it does not exist

please also print length(selectedFidsIdx)

print(length(selectedFidsIdx)) [1] 10

On Fri, Jul 20, 2018 at 6:36 AM tgambin notifications@github.com wrote:

Could you please check extAOH object like you did it before ? Is it still empty ?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BCM-Lupskilab/HMZDelFinder/issues/7#issuecomment-406562224, or mute the thread https://github.com/notifications/unsubscribe-auth/AHL3I1iYmjKfZLIgNQGX3FnDbohK7TeKks5uIbKzgaJpZM4SOhe9 .

liuhankui commented 6 years ago

I notice this in your script： newAOH <- rbindlist(lapply(c(1:22,"X","Y"), function(chr){

some vcf file use “chr1”，“chr2” but not “1”，“2”，“X”，“Y”

suggest using “unique(tmpAOH$chrom)”

ricardoharripaul commented 6 years ago

Hi Tomek,

Any input on this. i believe you said you were able to get it running previously.

Cheers,

Ricardo

On Fri, Jul 20, 2018 at 6:36 AM tgambin notifications@github.com wrote:

Could you please check extAOH object like you did it before ? Is it still empty ?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BCM-Lupskilab/HMZDelFinder/issues/7#issuecomment-406562224, or mute the thread https://github.com/notifications/unsubscribe-auth/AHL3I1iYmjKfZLIgNQGX3FnDbohK7TeKks5uIbKzgaJpZM4SOhe9 .

tgambin commented 6 years ago

Let me check it again and I will let you know tomorrow.

tgambin commented 6 years ago

Ok,

Please try again. I fixed potential problem with "chr" prefix. In addition please, make sure that files are compressed with bzip2 (I had to uncompress and run bzip2 again on your files).

Please, first try to run the following command and check if it works: dt <- prepareAOHData ("DMA_1.100019.vcf.bz2", "DMA_1", 1, "AD", "DP" , ".") dim(dt)

[1] 2838 10

Best, Tomek

ricardoharripaul commented 6 years ago

Hi Tomek,

Just touching base again.

Cheers,

Ricardo

On Thu, Aug 2, 2018 at 7:03 AM tgambin notifications@github.com wrote:

Ok,

Please try again. I fixed potential problem with "chr" prefix. In addition please, make sure that files are compressed with bzip2 (I had to uncompress and run bzip2 again on your files).

Please, first try to run the following command and check if it works: dt <- prepareAOHData ("DMA_1.100019.vcf.bz2", "DMA_1", 1, "AD", "DP" , ".") dim(dt)

[1] 2838 10

Best, Tomek

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BCM-Lupskilab/HMZDelFinder/issues/7#issuecomment-409889532, or mute the thread https://github.com/notifications/unsubscribe-auth/AHL3IyMt_fjUjEAI9Fs0psmSpQ0in2-yks5uMtxkgaJpZM4SOhe9 .

tgambin commented 6 years ago

Have you seen my latest comment ? Please, see below

Please try again. I fixed potential problem with "chr" prefix. In addition please, make sure that files are compressed with bzip2 (I had to uncompress and run bzip2 again on your files).

Please, first try to run the following command and check if it works: dt <- prepareAOHData ("DMA_1.100019.vcf.bz2", "DMA_1", 1, "AD", "DP" , ".") dim(dt)

BCM-Lupskilab / HMZDelFinder

Broken Pipe Filtering for AOH #7

calcRPKMsFromBAMs(bedFile,bamFiles , sampleNames, rpkmDir,8)

calcRPKMsFromBAMs(bedFile,bamFiles , sampleNames, rpkmDir,8)

please also print length(selectedFidsIdx)

please also print length(selectedFidsIdx)

please also print length(selectedFidsIdx)

[1] 2838 10

[1] 2838 10

[1] 2838 10