Closed ricardoharripaul closed 1 year ago
Did you try to manually execute bzcat for singe file: eg.: bzcat /home/ricardo/Downloads/DMA/ATLAS/DMA_1.10019.vcf.bz2
Hi,
Yes. I did try that and it did work and produced the vcf file.
Ricardo
On Thu, Feb 22, 2018 at 11:55 AM, tgambin notifications@github.com wrote:
Did you try to manually execute bzcat for singe file: eg.: bzcat /home/ricardo/Downloads/DMA/ATLAS/DMA_1.10019.vcf.bz2
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BCM-Lupskilab/HMZDelFinder/issues/7#issuecomment-367746221, or mute the thread https://github.com/notifications/unsubscribe-auth/AHL3I8dqbZdeHD2e8MfTgJaqD9MnaaNnks5tXZvbgaJpZM4SOhe9 .
Ok. Could you please, try to run in R "read.vcf.header" function for the single vcf file ?
Hi,
So I tried to run the command and it gives me the same error, however, it does print the vcf header. Does the mean the software is running despite saying it cannot find the file? Should I just let it run to completion?
read.vcf.header("/home/ricardo/Downloads/DMA/ATLAS/DMA-7_S1.vcf.bz2")bzcat: I/O or other error, bailing out. Possible reason follows. bzcat: Broken pipe Input file = /home/ricardo/Downloads/DMA/ATLAS/DMA-7_S1.vcf.bz2, output file = (stdout)$header [1] "CHROM" "POS" "ID" "REF" "ALT" "QUAL" "FILTER" "INFO" "FORMAT" "DMA_7"
$INFO $INFO$AA key value 1 ID AA 2 Number 1 3 Type String 4 Description The inferred allele ancestral to the chimpanzee/human lineage
$INFO$AC key value 1 ID AC 2 Number A 3 Type Integer 4 Description Allele count in genotypes 5 for each ALT allele NA= for each ALT allele 6 in the same order as listed NA= in the same order as listed
$INFO$AF key value 1 ID AF 2 Number A 3 Type Float 4 Description Allele Frequency 5 for each ALT allele NA= for each ALT allele 6 in the same order as listed NA= in the same order as listed
$INFO$AF1000G key value 1 ID AF1000G 2 Number 1 3 Type String 4 Description The allele frequency from all populations of 1000 genomes data
$INFO$AN key value 1 ID AN 2 Number 1 3 Type Integer 4 Description Total number of alleles in called genotypes
$INFO$BaseQRankSum key value 1 ID BaseQRankSum 2 Number 1 3 Type Float 4 Description Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities
$INFO$BLOCKAVG_min30p3a
key
1 ID 2 Number 3 Type 4 Description 5 have the same filter value 6 and have all sample values in range [x 7 y] 8 y < 9 (x*(1+0.3))). All printed site block sample values are the minimum observed in the region spanned by the block
value
1 BLOCKAVG_min30p3a 2 0 3 Flag 4 Non-variant site block. All sites in a block are constrained to be non-variant 5 NA= have the same filter value 6 NA= and have all sample values in range [x 7 NA=y] 8 max(x+3 9 NA=(x*(1+0.3))). All printed site block sample values are the minimum observed in the region spanned by the block
$INFO$clinvar key value 1 ID clinvar 2 Number . 3 Type String 4 Description Clinical significance
$INFO$cosmic key 1 ID 2 Number 3 Type 4 Description
value
1 cosmic 2 . 3 String 4 The numeric identifier for the variant in the Catalogue of Somatic Mutations in Cancer (COSMIC) database
$INFO$CSQR key 1 ID 2 Number 3 Type 4 Description
value
1 CSQR 2 . 3 String 4 Regulatory consequence type as predicted by VEP version 72. Feature source: Ensembl. Format: RegulatoryID|Consequence
$INFO$CSQT key 1 ID 2 Number 3 Type 4 Description
value
1 CSQT 2 . 3 String 4 Transcript consequence as predicted by VEP version 72. Transcript source: refseq. Format: HGNC|TranscriptID|Consequence
$INFO$Dels key value 1 ID Dels 2 Number 1 3 Type Float 4 Description Fraction of Reads Containing Spanning Deletions
$INFO$DP key value 1 ID DP 2 Number 1 3 Type Integer 4 Description Approximate read depth; some reads may have been filtered
$INFO$DS key value 1 ID DS 2 Number 0 3 Type Flag 4 Description Were any of the samples downsampled?
$INFO$END key value 1 ID END 2 Number 1 3 Type Integer 4 Description End position of the region described in this record
$INFO$EVS
key
1 ID 2 Number 3 Type 4 Description 5 sample count and coverage taken from the Exome Variant Server (EVS). Format: AlleleFreqEVS|EVSCoverage|EVSSamples
value
1 EVS 2 . 3 String 4 Allele frequency 5 NA= sample count and coverage taken from the Exome Variant Server (EVS). Format: AlleleFreqEVS|EVSCoverage|EVSSamples
$INFO$FS key value 1 ID FS 2 Number 1 3 Type Float 4 Description Phred-scaled p-value using Fisher's exact test to detect strand bias
$INFO$GMAF
key
1 ID 2 Number 3 Type 4 Description 5 the frequency of the second most frequent allele. Format: GlobalMinorAllele|AlleleFreqGlobalMinor
value
1 GMAF 2 . 3 String 4 Global minor allele frequency (GMAF); technically 5 NA= the frequency of the second most frequent allele. Format: GlobalMinorAllele|AlleleFreqGlobalMinor
$INFO$HaplotypeScore key value 1 ID HaplotypeScore 2 Number 1 3 Type Float 4 Description Consistency of the site with at most two segregating haplotypes
$INFO$HRun key value 1 ID HRun 2 Number 1 3 Type Integer 4 Description Largest Contiguous Homopolymer Run of Variant Allele In Either Direction
$INFO$InbreedingCoeff key 1 ID 2 Number 3 Type 4 Description
value
1 InbreedingCoeff 2 1 3 Float 4 Inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation
$INFO$MQ key value 1 ID MQ 2 Number 1 3 Type Float 4 Description RMS Mapping Quality
$INFO$MQ0 key value 1 ID MQ0 2 Number 1 3 Type Integer 4 Description Total Mapping Quality Zero Reads
$INFO$MQRankSum key value 1 ID MQRankSum 2 Number 1 3 Type Float 4 Description Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities
$INFO$phastCons key 1 ID 2 Number 3 Type 4 Description
value
1 phastCons 2 0 3 Flag 4 Denotes if the variant is an identical or similar sequence that occurs between species and maintained between species throughout evolution
$INFO$QD key value 1 ID QD 2 Number 1 3 Type Float 4 Description Variant Confidence/Quality by Depth
$INFO$ReadPosRankSum key value 1 ID ReadPosRankSum 2 Number 1 3 Type Float 4 Description Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias
$INFO$SB key value 1 ID SB 2 Number 1 3 Type Float 4 Description Strand Bias
$FORMAT $FORMAT$AD key value 1 ID AD 2 Number . 3 Type Integer 4 Description Allelic depths for the ref and alt alleles in the order listed
$FORMAT$DP key value 1 ID DP 2 Number 1 3 Type Integer 4 Description Approximate read depth (reads with MQ=255 or with bad mates are filtered)
$FORMAT$GQ key value 1 ID GQ 2 Number 1 3 Type Float 4 Description Genotype Quality
$FORMAT$GQX key 1 ID 2 Number 3 Type 4 Description 5 Genotype quality assuming non-variant position} value 1 GQX 2 1 3 Integer 4 Minimum of {Genotype quality assuming variant position 5 NA=Genotype quality assuming non-variant position}
$FORMAT$GT key value 1 ID GT 2 Number 1 3 Type String 4 Description Genotype
$FORMAT$MQ key value 1 ID MQ 2 Number 1 3 Type Integer 4 Description RMS Mapping Quality
$FORMAT$PL key 1 ID 2 Number 3 Type 4 Description 5 Phred-scaled likelihoods for genotypes as defined in the VCF specification value 1 PL 2 G 3 Integer 4 Normalized 5 NA= Phred-scaled likelihoods for genotypes as defined in the VCF specification
$FORMAT$VF key 1 ID 2 Number 3 Type 4 Description 5 the ratio of the sum of the called variant depth to the total depth value 1 VF 2 1 3 Float 4 Variant Frequency 5 NA= the ratio of the sum of the called variant depth to the total depth
$nlines [1] 82 Warning message:closing unused connection 3 (bzcat /home/ricardo/Downloads/DMA/ATLAS/DMA-7_S1.vcf.bz2)
On Thu, Feb 22, 2018 at 3:11 PM, tgambin notifications@github.com wrote:
Ok. Could you please, try to run in R "read.vcf.header" function for the single vcf file ?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BCM-Lupskilab/HMZDelFinder/issues/7#issuecomment-367806891, or mute the thread https://github.com/notifications/unsubscribe-auth/AHL3I5TtX-vU--xSKrKNzlBgvcdy9Vkbks5tXcn1gaJpZM4SOhe9 .
Hi,
How long is the AOH filtering suppose to run for? Does it take a long time to complete? Does maxing out the cores and running two instances on the same files significantly slow it down?
Thanks,
Ricardo
On Fri, Feb 23, 2018 at 4:07 PM, Ricardo Harripaul < ricardo.harripaul@gmail.com> wrote:
Hi,
So I tried to run the command and it gives me the same error, however, it does print the vcf header. Does the mean the software is running despite saying it cannot find the file? Should I just let it run to completion?
read.vcf.header("/home/ricardo/Downloads/DMA/ATLAS/DMA-7_S1.vcf.bz2")bzcat: I/O or other error, bailing out. Possible reason follows. bzcat: Broken pipe Input file = /home/ricardo/Downloads/DMA/ATLAS/DMA-7_S1.vcf.bz2, output file = (stdout)$header [1] "CHROM" "POS" "ID" "REF" "ALT" "QUAL" "FILTER" "INFO" "FORMAT" "DMA_7"
$INFO $INFO$AA key value 1 ID AA 2 Number 1 3 Type String 4 Description The inferred allele ancestral to the chimpanzee/human lineage
$INFO$AC key value 1 ID AC 2 Number A 3 Type Integer 4 Description Allele count in genotypes 5 for each ALT allele NA= for each ALT allele 6 in the same order as listed NA= in the same order as listed
$INFO$AF key value 1 ID AF 2 Number A 3 Type Float 4 Description Allele Frequency 5 for each ALT allele NA= for each ALT allele 6 in the same order as listed NA= in the same order as listed
$INFO$AF1000G key value 1 ID AF1000G 2 Number 1 3 Type String 4 Description The allele frequency from all populations of 1000 genomes data
$INFO$AN key value 1 ID AN 2 Number 1 3 Type Integer 4 Description Total number of alleles in called genotypes
$INFO$BaseQRankSum key value 1 ID BaseQRankSum 2 Number 1 3 Type Float 4 Description Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities
$INFO$BLOCKAVG_min30p3a key 1 ID 2 Number 3 Type 4 Description 5 have the same filter value 6 and have all sample values in range [x 7 y] 8 y < 9 (x(1+0.3))). All printed site block sample values are the minimum observed in the region spanned by the block value 1 BLOCKAVG_min30p3a 2 0 3 Flag 4 Non-variant site block. All sites in a block are constrained to be non-variant 5 NA= have the same filter value 6 NA= and have all sample values in range [x 7 NA=y] 8 max(x+3 9 NA=(x(1+0.3))). All printed site block sample values are the minimum observed in the region spanned by the block
$INFO$clinvar key value 1 ID clinvar 2 Number . 3 Type String 4 Description Clinical significance
$INFO$cosmic key 1 ID 2 Number 3 Type 4 Description value 1 cosmic 2 . 3 String 4 The numeric identifier for the variant in the Catalogue of Somatic Mutations in Cancer (COSMIC) database
$INFO$CSQR key 1 ID 2 Number 3 Type 4 Description value 1 CSQR 2 . 3 String 4 Regulatory consequence type as predicted by VEP version 72. Feature source: Ensembl. Format: RegulatoryID|Consequence
$INFO$CSQT key 1 ID 2 Number 3 Type 4 Description value 1 CSQT 2 . 3 String 4 Transcript consequence as predicted by VEP version 72. Transcript source: refseq. Format: HGNC|TranscriptID|Consequence
$INFO$Dels key value 1 ID Dels 2 Number 1 3 Type Float 4 Description Fraction of Reads Containing Spanning Deletions
$INFO$DP key value 1 ID DP 2 Number 1 3 Type Integer 4 Description Approximate read depth; some reads may have been filtered
$INFO$DS key value 1 ID DS 2 Number 0 3 Type Flag 4 Description Were any of the samples downsampled?
$INFO$END key value 1 ID END 2 Number 1 3 Type Integer 4 Description End position of the region described in this record
$INFO$EVS key 1 ID 2 Number 3 Type 4 Description 5 sample count and coverage taken from the Exome Variant Server (EVS). Format: AlleleFreqEVS|EVSCoverage|EVSSamples value 1 EVS 2 . 3 String 4 Allele frequency 5 NA= sample count and coverage taken from the Exome Variant Server (EVS). Format: AlleleFreqEVS|EVSCoverage|EVSSamples
$INFO$FS key value 1 ID FS 2 Number 1 3 Type Float 4 Description Phred-scaled p-value using Fisher's exact test to detect strand bias
$INFO$GMAF key 1 ID 2 Number 3 Type 4 Description 5 the frequency of the second most frequent allele. Format: GlobalMinorAllele|AlleleFreqGlobalMinor value 1 GMAF 2 . 3 String 4 Global minor allele frequency (GMAF); technically 5 NA= the frequency of the second most frequent allele. Format: GlobalMinorAllele|AlleleFreqGlobalMinor
$INFO$HaplotypeScore key value 1 ID HaplotypeScore 2 Number 1 3 Type Float 4 Description Consistency of the site with at most two segregating haplotypes
$INFO$HRun key value 1 ID HRun 2 Number 1 3 Type Integer 4 Description Largest Contiguous Homopolymer Run of Variant Allele In Either Direction
$INFO$InbreedingCoeff key 1 ID 2 Number 3 Type 4 Description value 1 InbreedingCoeff 2 1 3 Float 4 Inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation
$INFO$MQ key value 1 ID MQ 2 Number 1 3 Type Float 4 Description RMS Mapping Quality
$INFO$MQ0 key value 1 ID MQ0 2 Number 1 3 Type Integer 4 Description Total Mapping Quality Zero Reads
$INFO$MQRankSum key value 1 ID MQRankSum 2 Number 1 3 Type Float 4 Description Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities
$INFO$phastCons key 1 ID 2 Number 3 Type 4 Description value 1 phastCons 2 0 3 Flag 4 Denotes if the variant is an identical or similar sequence that occurs between species and maintained between species throughout evolution
$INFO$QD key value 1 ID QD 2 Number 1 3 Type Float 4 Description Variant Confidence/Quality by Depth
$INFO$ReadPosRankSum key value 1 ID ReadPosRankSum 2 Number 1 3 Type Float 4 Description Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias
$INFO$SB key value 1 ID SB 2 Number 1 3 Type Float 4 Description Strand Bias
$FORMAT $FORMAT$AD key value 1 ID AD 2 Number . 3 Type Integer 4 Description Allelic depths for the ref and alt alleles in the order listed
$FORMAT$DP key value 1 ID DP 2 Number 1 3 Type Integer 4 Description Approximate read depth (reads with MQ=255 or with bad mates are filtered)
$FORMAT$GQ key value 1 ID GQ 2 Number 1 3 Type Float 4 Description Genotype Quality
$FORMAT$GQX key 1 ID 2 Number 3 Type 4 Description 5 Genotype quality assuming non-variant position} value 1 GQX 2 1 3 Integer 4 Minimum of {Genotype quality assuming variant position 5 NA=Genotype quality assuming non-variant position}
$FORMAT$GT key value 1 ID GT 2 Number 1 3 Type String 4 Description Genotype
$FORMAT$MQ key value 1 ID MQ 2 Number 1 3 Type Integer 4 Description RMS Mapping Quality
$FORMAT$PL key 1 ID 2 Number 3 Type 4 Description 5 Phred-scaled likelihoods for genotypes as defined in the VCF specification value 1 PL 2 G 3 Integer 4 Normalized 5 NA= Phred-scaled likelihoods for genotypes as defined in the VCF specification
$FORMAT$VF key 1 ID 2 Number 3 Type 4 Description 5 the ratio of the sum of the called variant depth to the total depth value 1 VF 2 1 3 Float 4 Variant Frequency 5 NA= the ratio of the sum of the called variant depth to the total depth
$nlines [1] 82 Warning message:closing unused connection 3 (bzcat /home/ricardo/Downloads/DMA/ATLAS/DMA-7_S1.vcf.bz2)
On Thu, Feb 22, 2018 at 3:11 PM, tgambin notifications@github.com wrote:
Ok. Could you please, try to run in R "read.vcf.header" function for the single vcf file ?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BCM-Lupskilab/HMZDelFinder/issues/7#issuecomment-367806891, or mute the thread https://github.com/notifications/unsubscribe-auth/AHL3I5TtX-vU--xSKrKNzlBgvcdy9Vkbks5tXcn1gaJpZM4SOhe9 .
It should not take so long. How many files you are processing? Could you send me a sample VCF that you are trying to process?
Thanks, Tomek
Hi Tomek,
I am running 10 files and it not processing. I tried using the same commands in the terminal and it did not give the error but the whole AOH calling hangs. Here is the set of VCF files I am using. Just a note, I did run all the BAM files with no problems.
https://drive.google.com/open?id=1wCgmq5mUhsOYCYzro0n0W7lXMSHTWgV6
Thanks,
Ricardo
On Sun, Feb 25, 2018 at 11:02 AM, tgambin notifications@github.com wrote:
It should not take so long. How many files you are processing? Could you send me a sample VCF that you are trying to process?
Thanks, Tomek
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BCM-Lupskilab/HMZDelFinder/issues/7#issuecomment-368320919, or mute the thread https://github.com/notifications/unsubscribe-auth/AHL3IzoaSeqopd2qMlKtY1zhpIeRSaXxks5tYYQlgaJpZM4SOhe9 .
Thanks! Let me test it and I get back to you soon
I found the problem and it supposed to be fixed right now (it was related to the way I parsed allelic depths). Please, try again and remember to set: vR_id<-"AD" tR_id<-"DP" filter <- "."
Thanks a lot Tomek,
I made the changes you suggested but I am still getting some errors. I pasted the code I used below:
| | | | \/ | / \ | | () | | _ | || | |\/| | / /| | | |/ \ | | | | ' \ / ` |/ \ '| | | | | |/ /| |_| | / | | | | | | | (| | / | || ||| |/|/ _||| ||| ||_,|\|_|
[1] "[step 1 out of 7] ** AOH data **" [1] "Reading VCFs and generating AOH data using CBS ..." | ====================================================================== | 100% [1] "Extending AOHs by length of uncertain regions ..." | ====================================================================== | 100% [1] "[step 2 out of 7] ** Preparing RPKM data **" [1] "Reading RPKM files ..." |
---|
0%Error in if (file.info(file)$size == 0) { : missing value where TRUE/FALSE needed
setwd("/home/ricardo/Downloads/HMZDelFinder-master/") workDir <- getwd() mainDir <- paste0(workDir ,"/HMZDelFinder/"); if (!file.exists(mainDir)){dir.create(mainDir)} dataDir <- paste0(mainDir, "data/" , sep=""); if (!file.exists(dataDir)){dir.create(dataDir)} # data directory
library(RCurl) library(data.table) library(gdata) library(parallel) library(Hmisc) library(matrixStats) library(DNAcopy) library(GenomicRanges) library(Rsubread)
eval( expr = parse( text = getURL(" https://raw.githubusercontent.com/BCM-Lupskilab/HMZDelFinder/master/src/HMZDelFinder.R") ))
bedFile <- paste0(dataDir, "tgp_hg19.bed") # set path to BED file outputDir <- paste0(mainDir, "out/" , sep=""); if (!file.exists(outputDir)){dir.create(outputDir)} # create output directory plotsDir <- paste0(mainDir, "plots/" , sep=""); if (!file.exists(plotsDir)){dir.create(plotsDir)} # create output plots directory rpkmFiles <- dir(paste(dataDir, "TGP/",sep=""), "rpkm2.txt$") # list of RPKM file names rpkmFids <- gsub(".rpkm2.txt", "", rpkmFiles) # list of sample identifiers rpkmPaths <- paste0(paste0(dataDir, "TGP/"), rpkmFiles) # list of paths to RPKM files aohDir <- paste0(mainDir, "AOH/" , sep=""); if (!file.exists(aohDir)){dir.create(aohDir)} aohRDataOut <- paste(mainDir, "AOH/extAOH_small.RData", sep="") # temprary file to store AOH data
rpkmFiles <- rpkmFiles[1:10] rpkmFids<- rpkmFids[1:10] rpkmPaths <- rpkmPaths[1:10] pathToBams <- "/home/ricardo/Downloads/DMA/" rpkmDir <- "/home/ricardo/Downloads/HMZDelFinder-master/HMZDelFinder/data/TGP" sampleNames <- sapply(strsplit(dir(pathToBams, "bam$"), "[/\.]"), function(x){x[length(x)-1]}) # sample identifiers sampleNames <- sampleNames[1:10]
vcfFiles <- dir("/home/ricardo/Downloads/DMA/ATLAS", "vcf.bz2$", recursive = T, include.dirs = FALSE) vcfFids <- sapply(strsplit(vcfFiles,"[/\.]"), function(x){x[1]}) vcfPaths <- paste0("/home/ricardo/Downloads/DMA/ATLAS/",vcfFiles)
is_cmg <- FALSE # only for CMG project - otherwhise use FALSE lowRPKMthreshold <- 0.65# RPKM threshold maxFrequency <- 0.05 # max frequncy of HMZ deletion; default =0.005 minAOHsize <- 1000 # min AOH size minAOHsig <- 0.45 # min AOH signal threshold mc.cores<-8 # number of cores vR_id<-"AD" # ID from VCF FORMAT indicating the number of variant reads, for other variant callers could be "AD" tR_id<-"DP" # ID from VCF FORMAT indicating the number total reads filter <- "." # for other variant callers be '.'
results <- runHMZDelFinder (vcfPaths, # vcfPaths - paths to VCF files for AOH analysis (not used for 1000 genomes) vcfFids, # vcfFids - sample identifiers corresponding to VCF files (not used for 1000 genomes) rpkmPaths, # paths to RPKM files rpkmFids, # samples identifiers corresponding to RPKM files mc.cores, # number of CPU cores aohRDataOut,# temp file to store AOH data bedFile, # bed file with target lowRPKMthreshold, # RPKM threshold minAOHsize, # min AOH size minAOHsig, # min AOH signal threshold is_cmg, # flag used for CMG specific annotations; TRUE samples are from BHCMG cohort, FALSE otherwhise vR_id, # ID for 'the number of variants reads' in VCF FORMAT column (default='VR'); tR_id, # ID for 'the number of total reads' in VCF FORMAT column (default='DP') filter)
On Wed, Feb 28, 2018 at 8:53 AM, tgambin notifications@github.com wrote:
I found the problem and it supposed to be fixed right now (it was related to the way I parsed allelic depths). Please, try again and remember to set: vR_id<-"AD" tR_id<-"DP" filter <- "."
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BCM-Lupskilab/HMZDelFinder/issues/7#issuecomment-369245566, or mute the thread https://github.com/notifications/unsubscribe-auth/AHL3I8DEhstBk8eY-2Wk7Va5couSEkdKks5tZVpcgaJpZM4SOhe9 .
Dear Tomek,
I wondered if you received my last email.
I had an error causing the script to fail after the AOH was successfully processed:
| | | | \/ | / \ | | () | | _ | || | |\/| | / /| | | |/ \ | | | | ' \ / ` |/ \ '| | | | | |/ /| |_| | / | | | | | | | (| | / | || ||| |/|/ _||| ||| ||_,|\|_|
[1] "[step 1 out of 7] ** AOH data **" [1] "Reading VCFs and generating AOH data using CBS ..." | ====================================================================== | 100% [1] "Extending AOHs by length of uncertain regions ..." | ====================================================================== | 100% [1] "[step 2 out of 7] ** Preparing RPKM data **" [1] "Reading RPKM files ..." |
---|
0%Error in if (file.info(file)$size == 0) { : missing value where TRUE/FALSE needed
Cheers,
Ricardo
On Fri, Mar 2, 2018 at 2:41 PM, Ricardo Harripaul < ricardo.harripaul@gmail.com> wrote:
Thanks a lot Tomek,
I made the changes you suggested but I am still getting some errors. I pasted the code I used below:
| | | | \/ | / \ | | () | | _ | || | |\/| | / /| | | |/ \ | | | | ' \ / ` |/ \ '| | | | | |/ /| |_| | / | | | | | | | (| | / | || ||| |/|/ _||| ||| ||_,|\|_|
[1] "[step 1 out of 7] ** AOH data **" [1] "Reading VCFs and generating AOH data using CBS ..." ====================================================================== 100% [1] "Extending AOHs by length of uncertain regions ..." ====================================================================== 100% [1] "[step 2 out of 7] ** Preparing RPKM data **" [1] "Reading RPKM files ..." 0%Error in if (file.info(file)$size == 0) { : missing value where TRUE/FALSE needed
setwd("/home/ricardo/Downloads/HMZDelFinder-master/") workDir <- getwd() mainDir <- paste0(workDir ,"/HMZDelFinder/"); if (!file.exists(mainDir)){dir.create(mainDir)} dataDir <- paste0(mainDir, "data/" , sep=""); if (!file.exists(dataDir)){dir.create(dataDir)} # data directory
library(RCurl) library(data.table) library(gdata) library(parallel) library(Hmisc) library(matrixStats) library(DNAcopy) library(GenomicRanges) library(Rsubread)
eval( expr = parse( text = getURL("https://raw.githubusercontent.com/BCM- Lupskilab/HMZDelFinder/master/src/HMZDelFinder.R") ))
bedFile <- paste0(dataDir, "tgp_hg19.bed") # set path to BED file outputDir <- paste0(mainDir, "out/" , sep=""); if (!file.exists(outputDir)){dir.create(outputDir)} # create output directory plotsDir <- paste0(mainDir, "plots/" , sep=""); if (!file.exists(plotsDir)){dir.create(plotsDir)} # create output plots directory rpkmFiles <- dir(paste(dataDir, "TGP/",sep=""), "rpkm2.txt$") # list of RPKM file names rpkmFids <- gsub(".rpkm2.txt", "", rpkmFiles) # list of sample identifiers rpkmPaths <- paste0(paste0(dataDir, "TGP/"), rpkmFiles) # list of paths to RPKM files aohDir <- paste0(mainDir, "AOH/" , sep=""); if (!file.exists(aohDir)){dir. create(aohDir)} aohRDataOut <- paste(mainDir, "AOH/extAOH_small.RData", sep="") # temprary file to store AOH data
rpkmFiles <- rpkmFiles[1:10] rpkmFids<- rpkmFids[1:10] rpkmPaths <- rpkmPaths[1:10] pathToBams <- "/home/ricardo/Downloads/DMA/" rpkmDir <- "/home/ricardo/Downloads/HMZDelFinder-master/ HMZDelFinder/data/TGP" sampleNames <- sapply(strsplit(dir(pathToBams, "bam$"), "[/\.]"), function(x){x[length(x)-1]}) # sample identifiers sampleNames <- sampleNames[1:10]
calcRPKMsFromBAMs(bedFile,bamFiles , sampleNames, rpkmDir,8)
vcfFiles <- dir("/home/ricardo/Downloads/DMA/ATLAS", "vcf.bz2$", recursive = T, include.dirs = FALSE) vcfFids <- sapply(strsplit(vcfFiles,"[/\.]"), function(x){x[1]}) vcfPaths <- paste0("/home/ricardo/Downloads/DMA/ATLAS/",vcfFiles)
is_cmg <- FALSE # only for CMG project - otherwhise use FALSE lowRPKMthreshold <- 0.65# RPKM threshold maxFrequency <- 0.05 # max frequncy of HMZ deletion; default =0.005 minAOHsize <- 1000 # min AOH size minAOHsig <- 0.45 # min AOH signal threshold mc.cores<-8 # number of cores vR_id<-"AD" # ID from VCF FORMAT indicating the number of variant reads, for other variant callers could be "AD" tR_id<-"DP" # ID from VCF FORMAT indicating the number total reads filter <- "." # for other variant callers be '.'
results <- runHMZDelFinder (vcfPaths, # vcfPaths - paths to VCF files for AOH analysis (not used for 1000 genomes) vcfFids, # vcfFids - sample identifiers corresponding to VCF files (not used for 1000 genomes) rpkmPaths, # paths to RPKM files rpkmFids, # samples identifiers corresponding to RPKM files mc.cores, # number of CPU cores aohRDataOut,# temp file to store AOH data bedFile, # bed file with target lowRPKMthreshold, # RPKM threshold minAOHsize, # min AOH size minAOHsig, # min AOH signal threshold is_cmg, # flag used for CMG specific annotations; TRUE samples are from BHCMG cohort, FALSE otherwhise vR_id, # ID for 'the number of variants reads' in VCF FORMAT column (default='VR'); tR_id, # ID for 'the number of total reads' in VCF FORMAT column (default='DP') filter)
On Wed, Feb 28, 2018 at 8:53 AM, tgambin notifications@github.com wrote:
I found the problem and it supposed to be fixed right now (it was related to the way I parsed allelic depths). Please, try again and remember to set: vR_id<-"AD" tR_id<-"DP" filter <- "."
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BCM-Lupskilab/HMZDelFinder/issues/7#issuecomment-369245566, or mute the thread https://github.com/notifications/unsubscribe-auth/AHL3I8DEhstBk8eY-2Wk7Va5couSEkdKks5tZVpcgaJpZM4SOhe9 .
Can you check if rpkm files were generated?
Hi,
So I can tell you that the rpkm2.txt files were definitely generated and they are in the same format as the example data.
The error still persists after attempting to redo the analysis. These are the same files that work without AOH filtering. Do you have any other suggestions?
Thanks,
Ricardo
[1] "[step 1 out of 7] ** AOH data **" [1] "Reading VCFs and generating AOH data using CBS ..." | ====================================================================== | 100% [1] "Extending AOHs by length of uncertain regions ..." | ====================================================================== | 100% [1] "[step 2 out of 7] ** Preparing RPKM data **" [1] "Reading RPKM files ..." |
---|
Error in if (file.info(file)$size == 0) { :
missing value where TRUE/FALSE needed
ricardo@ricardo-G750JW[TGP] head NA11843_rpkm2.txt [ 2:31PM] count RPKM 0 0 0 0 0 0 0 0 113 15.0938179577114 19 1.00955082218497 113 21.3089194697102 223 17.2956281136893 25 2.70148874653281 ricardo@ricardo-G750JW[TGP] head DMA-1_S1_rpkm2.txt [ 2:31PM] count RPKM 5 0.519272412253909 30 3.11563447352345 45 4.67345171028518 7 0.456370012788451 95 27.2690137601671 42 4.79566720620793 39 15.8042085941278 107 17.8336572163266 78 18.1126885011352
On Wed, Mar 7, 2018 at 10:06 AM, tgambin notifications@github.com wrote:
Can you check if rpkm files were generated?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BCM-Lupskilab/HMZDelFinder/issues/7#issuecomment-371165030, or mute the thread https://github.com/notifications/unsubscribe-auth/AHL3Iz6XEQnb1mYJ-VFFQ4AtcmK9NslBks5tb_V3gaJpZM4SOhe9 .
Could you please check how the object "selectedFidsIdx" looks like just before running prepareRPKMData? Are there any elements ?
Hi,
I am not sure how to access that function. Typing it in says it is not found.
selectedFidsIdxError: object 'selectedFidsIdx' not found> selectedFidsIdx()Error in selectedFidsIdx() : could not find function "selectedFidsIdx"
I looked up the Fids for the vcf, bam and rpkm Fids:
vcfFids [1] "DMA_1" "DMA-1_S1" "DMA_2" "DMA-2_S1" "DMA_3" "DMA-3_S1" "DMA-4_S1" "DMA-5_S1" "DMA-6_S1" "DMA-7_S1"> rpkmFids [1] "10018" "10019" "10020" "DMA-1_S1" "DMA-2_S1" "DMA-3_S1" "DMA-4_S1" "DMA-5_S1" "DMA-6_S1" "DMA-7_S1"> bamFiles [1] "/home/ricardo/Downloads/DMA/DMA_1.10019.bam" "/home/ricardo/Downloads/DMA/DMA-1_S1.bam" [3] "/home/ricardo/Downloads/DMA/DMA_2.10018.bam" "/home/ricardo/Downloads/DMA/DMA-2_S1.bam" [5] "/home/ricardo/Downloads/DMA/DMA_3.10020.bam" "/home/ricardo/Downloads/DMA/DMA-3_S1.bam" [7] "/home/ricardo/Downloads/DMA/DMA-4_S1.bam" "/home/ricardo/Downloads/DMA/DMA-5_S1.bam" [9] "/home/ricardo/Downloads/DMA/DMA-6_S1.bam" "/home/ricardo/Downloads/DMA/DMA-7_S1.bam"
On Thu, Mar 15, 2018 at 1:10 PM, tgambin notifications@github.com wrote:
Could you please check how the object "selectedFidsIdx" looks like just before running prepareRPKMData? Are there any elements ?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BCM-Lupskilab/HMZDelFinder/issues/7#issuecomment-373452803, or mute the thread https://github.com/notifications/unsubscribe-auth/AHL3IyXN8yGz-LpueFVlEcPOII2PhLD2ks5teqCTgaJpZM4SOhe9 .
It looks like vcfFids (prefix of file name) and rpkmFids (sufix of file name) are not matching and this may cause the problem. Could you please, try to change your code so vcfFids and rpkmFids follow the same convention ?
Hi Tomek,
I do not think that is the problem. I have made the IDs the same and I am still getting the same error.
rpkmFiles [1] "DMA_1_rpkm2.txt" "DMA-1_S1_rpkm2.txt" "DMA_2_rpkm2.txt" [4] "DMA-2_S1_rpkm2.txt" "DMA_3_rpkm2.txt" "DMA-3_S1_rpkm2.txt" [7] "DMA-4_S1_rpkm2.txt" "DMA-5_S1_rpkm2.txt" "DMA-6_S1_rpkm2.txt" [10] "DMA-7_S1_rpkm2.txt" rpkmFids <- gsub(".rpkm2.txt", "", rpkmFiles) rpkmFids [1] "DMA_1" "DMA-1_S1" "DMA_2" "DMA-2_S1" "DMA_3" "DMA-3_S1" [7] "DMA-4_S1" "DMA-5_S1" "DMA-6_S1" "DMA-7_S1" sampleNames [1] "DMA_1" "DMA-1_S1" "DMA_2" "DMA-2_S1" "DMA_3" "DMA-3_S1" [7] "DMA-4_S1" "DMA-5_S1" "DMA-6_S1" "DMA-7_S1" rpkmPaths <- paste0(dataDir, rpkmFiles) rpkmPaths [1] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/DMA_1_rpkm2.txt" [2] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/DMA-1_S1_rpkm2.txt" [3] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/DMA_2_rpkm2.txt" [4] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/DMA-2_S1_rpkm2.txt" [5] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/DMA_3_rpkm2.txt" [6] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/DMA-3_S1_rpkm2.txt" [7] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/DMA-4_S1_rpkm2.txt" [8] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/DMA-5_S1_rpkm2.txt" [9] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/DMA-6_S1_rpkm2.txt" [10] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/DMA-7_S1_rpkm2.txt" vcfFids [1] "DMA_1" "DMA-1_S1" "DMA_2" "DMA-2_S1" "DMA_3" "DMA-3_S1" [7] "DMA-4_S1" "DMA-5_S1" "DMA-6_S1" "DMA-7_S1" rpkmPaths [1] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/DMA_1_rpkm2.txt" [2] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/DMA-1_S1_rpkm2.txt" [3] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/DMA_2_rpkm2.txt" [4] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/DMA-2_S1_rpkm2.txt" [5] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/DMA_3_rpkm2.txt" [6] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/DMA-3_S1_rpkm2.txt" [7] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/DMA-4_S1_rpkm2.txt" [8] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/DMA-5_S1_rpkm2.txt" [9] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/DMA-6_S1_rpkm2.txt" [10] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/DMA-7_S1_rpkm2.txt" results <- runHMZDelFinder (vcfPaths, snpPaths= mc.cores= minAOHsize= tR_id= snpFids= aohRDataOut= minAOHsig= filter= rpkmPaths= bedFile= is_cmg= rpkmFids= lowRPKMthreshold= vR_id= results <- runHMZDelFinder (vcfPaths,# vcfPaths - paths to VCF files for AOH analysis (not used for 1000 genomes)
- vcfFids,# vcfFids - sample identifiers corresponding to VCF files (not used for 1000 genomes)
- rpkmPaths, # paths to RPKM files
- rpkmFids,# samples identifiers corresponding to RPKM files
- mc.cores,# number of CPU cores
- aohRDataOut,# temp file to store AOH data
- bedFile,# bed file with target
- lowRPKMthreshold, # RPKM threshold
- minAOHsize, # min AOH size
- minAOHsig,# min AOH signal threshold
- is_cmg,# flag used for CMG specific annotations; TRUE samples are from BHCMG cohort, FALSE otherwhise
- vR_id, # ID for 'the number of variants reads' in VCF FORMAT column (default='VR');
- tR_id,# ID for 'the number of total reads' in VCF FORMAT column (default='DP')
- filter)
| | | | \/ | / \ | | () | | _ | || | |\/| | / /| | | |/ \ | | | | ' \ / ` |/ \ '| | | | | |/ /| |_| | / | | | | | | | (| | / | || ||| |/|/ _||| ||| ||_,|\|_|
[1] "[step 1 out of 7] ** AOH data **" [1] "Reading VCFs and generating AOH data using CBS ..." | ====================================================================== | 100% [1] "Extending AOHs by length of uncertain regions ..." | ====================================================================== | 100% [1] "[step 2 out of 7] ** Preparing RPKM data **" [1] "Reading RPKM files ..." |
---|
Error in if (file.info(file)$size == 0) { :
missing value where TRUE/FALSE needed
On Sun, Mar 25, 2018 at 1:37 PM, tgambin notifications@github.com wrote:
It looks like vcfFids (prefix of file name) and rpkmFids (sufix of file name) are not matching and this may cause the problem. Could you please, try to change your code so vcfFids and rpkmFids follow the same convention ?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BCM-Lupskilab/HMZDelFinder/issues/7#issuecomment-375988521, or mute the thread https://github.com/notifications/unsubscribe-auth/AHL3IyVbq7vcyCF3BLtxdfvVdsSMeu72ks5th9XrgaJpZM4SOhe9 .
Ok, Instead of executing "runHMZDelFinder ()", could you please try to run the function body line by line, i.e.
library(gdata)
library(data.table)
library(GenomicRanges)
library(parallel)
library(matrixStats)
extAOH <- NULL
if (is.null(snpPaths) || is.null(snpFids)){print("Skipping AOH analysis ...")}
else{
library(DNAcopy)
extAOH <- prepareAOHData(snpPaths, snpFids, mc.cores, vR_id, tR_id, filter)
save(extAOH, file=aohRDataOut)
}
print("[step 2 out of 7] ****** Preparing RPKM data ******")
selectedFidsIdx <- 1:length(rpkmFids)
if (!is.null(extAOH)) {
selectedFidsIdx <- which(rpkmFids %in% extAOH$Name)
}
rpkmDt <- prepareRPKMData(rpkmPaths[selectedFidsIdx], rpkmFids[selectedFidsIdx], 1)
## please also print length(selectedFidsIdx)
print(length(selectedFidsIdx))
Hi Tomek,
It is producing the same error message. The error has been reproduced here and it seems the extAOH variable is empty from the prepareAOHData function. The paths are correct on the commandline but it seems to not be parsing out information. Please see below for the error and full message.
extAOH Empty data.table (0 rows) of 10 cols: ID,chrom,loc.start,loc.end,num.mark,seg.mean...
is_cmg <- FALSE # only for CMG project - otherwhise use FALSE lowRPKMthreshold <- 0.65# RPKM threshold maxFrequency <- 0.05# max frequncy of HMZ deletion; default =0.005 minAOHsize <- 1000# min AOH size minAOHsig <- 0.45# min AOH signal threshold mc.cores<-8 # number of cores vR_id<-"AD"# ID from VCF FORMAT indicating the number of variant reads, for other variant callers could be "AD" tR_id<-"DP"# ID from VCF FORMAT indicating the number total reads filter <- "."# for other variant callers be '.' getwd() [1] "/home/ricardo/Downloads/DMA" rpkmFiles [1] "DMA_1_rpkm2.txt" "DMA-1_S1_rpkm2.txt" "DMA_2_rpkm2.txt" [4] "DMA-2_S1_rpkm2.txt" "DMA_3_rpkm2.txt" "DMA-3_S1_rpkm2.txt" [7] "DMA-4_S1_rpkm2.txt" "DMA-5_S1_rpkm2.txt" "DMA-6_S1_rpkm2.txt" [10] "DMA-7_S1_rpkm2.txt" rpkmFids [1] "DMA_1" "DMA-1_S1" "DMA_2" "DMA-2_S1" "DMA_3" "DMA-3_S1" [7] "DMA-4_S1" "DMA-5_S1" "DMA-6_S1" "DMA-7_S1" rpkmPaths [1] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA_1_rpkm2.txt" [2] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-1_S1_rpkm2.txt" [3] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA_2_rpkm2.txt" [4] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-2_S1_rpkm2.txt" [5] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA_3_rpkm2.txt" [6] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-3_S1_rpkm2.txt" [7] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-4_S1_rpkm2.txt" [8] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-5_S1_rpkm2.txt" [9] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-6_S1_rpkm2.txt" [10] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-7_S1_rpkm2.txt" bamFiles [1] "/home/ricardo/Downloads/DMA/DMA_1.10019.bam" [2] "/home/ricardo/Downloads/DMA/DMA-1_S1.bam" [3] "/home/ricardo/Downloads/DMA/DMA_2.10018.bam" [4] "/home/ricardo/Downloads/DMA/DMA-2_S1.bam" [5] "/home/ricardo/Downloads/DMA/DMA_3.10020.bam" [6] "/home/ricardo/Downloads/DMA/DMA-3_S1.bam" [7] "/home/ricardo/Downloads/DMA/DMA-4_S1.bam" [8] "/home/ricardo/Downloads/DMA/DMA-5_S1.bam" [9] "/home/ricardo/Downloads/DMA/DMA-6_S1.bam" [10] "/home/ricardo/Downloads/DMA/DMA-7_S1.bam" sampleNames [1] "10019" "DMA-1_S1" "10018" "DMA-2_S1" "10020" "DMA-3_S1" [7] "DMA-4_S1" "DMA-5_S1" "DMA-6_S1" "DMA-7_S1" sampleNames<-rpkmFids sampleNames [1] "DMA_1" "DMA-1_S1" "DMA_2" "DMA-2_S1" "DMA_3" "DMA-3_S1" [7] "DMA-4_S1" "DMA-5_S1" "DMA-6_S1" "DMA-7_S1" vcfFiles [1] "DMA_1.10019.vcf.bz2" "DMA-1_S1.vcf.bz2" "DMA_2.10018.vcf.bz2" [4] "DMA-2_S1.vcf.bz2" "DMA_3.10020.vcf.bz2" "DMA-3_S1.vcf.bz2" [7] "DMA-4_S1.vcf.bz2" "DMA-5_S1.vcf.bz2" "DMA-6_S1.vcf.bz2" [10] "DMA-7_S1.vcf.bz2" vcfFids [1] "DMA_1" "DMA-1_S1" "DMA_2" "DMA-2_S1" "DMA_3" "DMA-3_S1" [7] "DMA-4_S1" "DMA-5_S1" "DMA-6_S1" "DMA-7_S1" vcfPaths [1] "/home/ricardo/Downloads/DMA/ATLAS/DMA_1.10019.vcf.bz2" [2] "/home/ricardo/Downloads/DMA/ATLAS/DMA-1_S1.vcf.bz2" [3] "/home/ricardo/Downloads/DMA/ATLAS/DMA_2.10018.vcf.bz2" [4] "/home/ricardo/Downloads/DMA/ATLAS/DMA-2_S1.vcf.bz2" [5] "/home/ricardo/Downloads/DMA/ATLAS/DMA_3.10020.vcf.bz2" [6] "/home/ricardo/Downloads/DMA/ATLAS/DMA-3_S1.vcf.bz2" [7] "/home/ricardo/Downloads/DMA/ATLAS/DMA-4_S1.vcf.bz2" [8] "/home/ricardo/Downloads/DMA/ATLAS/DMA-5_S1.vcf.bz2" [9] "/home/ricardo/Downloads/DMA/ATLAS/DMA-6_S1.vcf.bz2" [10] "/home/ricardo/Downloads/DMA/ATLAS/DMA-7_S1.vcf.bz2" library(gdata) library(data.table) library(GenomicRanges) library(parallel) library(matrixStats)
extAOH <- NULL if (is.null(vcfPaths) || is.null(vcfFids)){print("Skipping AOH analysis ...")
- } else{
- library(DNAcopy)
- extAOH <- prepareAOHData(vcfPaths, vcfFids, mc.cores, vR_id, tR_id, filter)
- save(extAOH, file=aohRDataOut)
- } [1] "Reading VCFs and generating AOH data using CBS ..." |======================================================================| 100% [1] "Extending AOHs by length of uncertain regions ..." |======================================================================| 100% extAOH Empty data.table (0 rows) of 10 cols: ID,chrom,loc.start,loc.end,num.mark,seg.mean... print("[step 2 out of 7] ** Preparing RPKM data **") [1] "[step 2 out of 7] ** Preparing RPKM data **" selectedFidsIdx <- 1:length(rpkmFids) if (!is.null(extAOH)) {
- selectedFidsIdx <- which(rpkmFids %in% extAOH$Name)
- }
rpkmDt <- prepareRPKMData(rpkmPaths[selectedFidsIdx], rpkmFids[selectedFidsIdx], 1) [1] "Reading RPKM files ..." Error in if (file.info(file)$size == 0) { : missing value where TRUE/FALSE needed 0%^C extAOH Empty data.table (0 rows) of 10 cols: ID,chrom,loc.start,loc.end,num.mark,seg.mean...
On Sat, Apr 7, 2018 at 5:54 AM, tgambin notifications@github.com wrote:
Ok, Instead of executing "runHMZDelFinder ()", could you please try to run the function body line by line, i.e.
library(gdata) library(data.table) library(GenomicRanges) library(parallel) library(matrixStats)
extAOH <- NULL if (is.null(snpPaths) || is.null(snpFids)){print("Skipping AOH analysis ...")} else{ library(DNAcopy) extAOH <- prepareAOHData(snpPaths, snpFids, mc.cores, vR_id, tR_id, filter) save(extAOH, file=aohRDataOut) }
print("[step 2 out of 7] ** Preparing RPKM data **") selectedFidsIdx <- 1:length(rpkmFids) if (!is.null(extAOH)) { selectedFidsIdx <- which(rpkmFids %in% extAOH$Name) }
rpkmDt <- prepareRPKMData(rpkmPaths[selectedFidsIdx], rpkmFids[selectedFidsIdx], 1)
please also print length(selectedFidsIdx)
print(length(selectedFidsIdx))
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BCM-Lupskilab/HMZDelFinder/issues/7#issuecomment-379457838, or mute the thread https://github.com/notifications/unsubscribe-auth/AHL3I_O4oui9WPXGQMY1af2dmkocXgvEks5tmIzsgaJpZM4SOhe9 .
Hi Tomek,
Just following up with you.
Cheers,
Ricardo
On Mon, Apr 16, 2018 at 4:32 PM, Ricardo Harripaul < ricardo.harripaul@gmail.com> wrote:
Hi Tomek,
It is producing the same error message. The error has been reproduced here and it seems the extAOH variable is empty from the prepareAOHData function. The paths are correct on the commandline but it seems to not be parsing out information. Please see below for the error and full message.
extAOH Empty data.table (0 rows) of 10 cols: ID,chrom,loc.start,loc.end, num.mark,seg.mean...
is_cmg <- FALSE # only for CMG project - otherwhise use FALSE lowRPKMthreshold <- 0.65# RPKM threshold maxFrequency <- 0.05# max frequncy of HMZ deletion; default =0.005 minAOHsize <- 1000# min AOH size minAOHsig <- 0.45# min AOH signal threshold mc.cores<-8 # number of cores vR_id<-"AD"# ID from VCF FORMAT indicating the number of variant reads, for other variant callers could be "AD" tR_id<-"DP"# ID from VCF FORMAT indicating the number total reads filter <- "."# for other variant callers be '.' getwd() [1] "/home/ricardo/Downloads/DMA" rpkmFiles [1] "DMA_1_rpkm2.txt" "DMA-1_S1_rpkm2.txt" "DMA_2_rpkm2.txt" [4] "DMA-2_S1_rpkm2.txt" "DMA_3_rpkm2.txt" "DMA-3_S1_rpkm2.txt" [7] "DMA-4_S1_rpkm2.txt" "DMA-5_S1_rpkm2.txt" "DMA-6_S1_rpkm2.txt" [10] "DMA-7_S1_rpkm2.txt" rpkmFids [1] "DMA_1" "DMA-1_S1" "DMA_2" "DMA-2_S1" "DMA_3" "DMA-3_S1" [7] "DMA-4_S1" "DMA-5_S1" "DMA-6_S1" "DMA-7_S1" rpkmPaths [1] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA_1_rpkm2.txt"
[2] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-1_ S1_rpkm2.txt" [3] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA_2_rpkm2.txt"
[4] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-2_ S1_rpkm2.txt" [5] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA_3_rpkm2.txt"
[6] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-3_ S1rpkm2.txt" [7] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-4 S1rpkm2.txt" [8] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-5 S1rpkm2.txt" [9] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-6 S1rpkm2.txt" [10] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-7 S1_rpkm2.txt"
bamFiles [1] "/home/ricardo/Downloads/DMA/DMA_1.10019.bam" [2] "/home/ricardo/Downloads/DMA/DMA-1_S1.bam" [3] "/home/ricardo/Downloads/DMA/DMA_2.10018.bam" [4] "/home/ricardo/Downloads/DMA/DMA-2_S1.bam" [5] "/home/ricardo/Downloads/DMA/DMA_3.10020.bam" [6] "/home/ricardo/Downloads/DMA/DMA-3_S1.bam" [7] "/home/ricardo/Downloads/DMA/DMA-4_S1.bam" [8] "/home/ricardo/Downloads/DMA/DMA-5_S1.bam" [9] "/home/ricardo/Downloads/DMA/DMA-6_S1.bam" [10] "/home/ricardo/Downloads/DMA/DMA-7_S1.bam" sampleNames [1] "10019" "DMA-1_S1" "10018" "DMA-2_S1" "10020" "DMA-3_S1" [7] "DMA-4_S1" "DMA-5_S1" "DMA-6_S1" "DMA-7_S1" sampleNames<-rpkmFids sampleNames [1] "DMA_1" "DMA-1_S1" "DMA_2" "DMA-2_S1" "DMA_3" "DMA-3_S1" [7] "DMA-4_S1" "DMA-5_S1" "DMA-6_S1" "DMA-7_S1" vcfFiles [1] "DMA_1.10019.vcf.bz2" "DMA-1_S1.vcf.bz2" "DMA_2.10018.vcf.bz2" [4] "DMA-2_S1.vcf.bz2" "DMA_3.10020.vcf.bz2" "DMA-3_S1.vcf.bz2" [7] "DMA-4_S1.vcf.bz2" "DMA-5_S1.vcf.bz2" "DMA-6_S1.vcf.bz2" [10] "DMA-7_S1.vcf.bz2" vcfFids [1] "DMA_1" "DMA-1_S1" "DMA_2" "DMA-2_S1" "DMA_3" "DMA-3_S1" [7] "DMA-4_S1" "DMA-5_S1" "DMA-6_S1" "DMA-7_S1" vcfPaths [1] "/home/ricardo/Downloads/DMA/ATLAS/DMA_1.10019.vcf.bz2" [2] "/home/ricardo/Downloads/DMA/ATLAS/DMA-1_S1.vcf.bz2" [3] "/home/ricardo/Downloads/DMA/ATLAS/DMA_2.10018.vcf.bz2" [4] "/home/ricardo/Downloads/DMA/ATLAS/DMA-2_S1.vcf.bz2" [5] "/home/ricardo/Downloads/DMA/ATLAS/DMA_3.10020.vcf.bz2" [6] "/home/ricardo/Downloads/DMA/ATLAS/DMA-3_S1.vcf.bz2" [7] "/home/ricardo/Downloads/DMA/ATLAS/DMA-4_S1.vcf.bz2" [8] "/home/ricardo/Downloads/DMA/ATLAS/DMA-5_S1.vcf.bz2" [9] "/home/ricardo/Downloads/DMA/ATLAS/DMA-6_S1.vcf.bz2" [10] "/home/ricardo/Downloads/DMA/ATLAS/DMA-7_S1.vcf.bz2" library(gdata) library(data.table) library(GenomicRanges) library(parallel) library(matrixStats)
extAOH <- NULL if (is.null(vcfPaths) || is.null(vcfFids)){print("Skipping AOH analysis ...")
- } else{
- library(DNAcopy)
- extAOH <- prepareAOHData(vcfPaths, vcfFids, mc.cores, vR_id, tR_id, filter)
- save(extAOH, file=aohRDataOut)
- } [1] "Reading VCFs and generating AOH data using CBS ..." |======================================================================| 100% [1] "Extending AOHs by length of uncertain regions ..." |======================================================================| 100% extAOH Empty data.table (0 rows) of 10 cols: ID,chrom,loc.start,loc.end, num.mark,seg.mean... print("[step 2 out of 7] ** Preparing RPKM data **") [1] "[step 2 out of 7] ** Preparing RPKM data **" selectedFidsIdx <- 1:length(rpkmFids) if (!is.null(extAOH)) {
- selectedFidsIdx <- which(rpkmFids %in% extAOH$Name)
- }
rpkmDt <- prepareRPKMData(rpkmPaths[selectedFidsIdx], rpkmFids[selectedFidsIdx], 1) [1] "Reading RPKM files ..." Error in if (file.info(file)$size == 0) { : missing value where TRUE/FALSE needed 0%^C extAOH Empty data.table (0 rows) of 10 cols: ID,chrom,loc.start,loc.end, num.mark,seg.mean...
On Sat, Apr 7, 2018 at 5:54 AM, tgambin notifications@github.com wrote:
Ok, Instead of executing "runHMZDelFinder ()", could you please try to run the function body line by line, i.e.
library(gdata) library(data.table) library(GenomicRanges) library(parallel) library(matrixStats)
extAOH <- NULL if (is.null(snpPaths) || is.null(snpFids)){print("Skipping AOH analysis ...")} else{ library(DNAcopy) extAOH <- prepareAOHData(snpPaths, snpFids, mc.cores, vR_id, tR_id, filter) save(extAOH, file=aohRDataOut) }
print("[step 2 out of 7] ** Preparing RPKM data **") selectedFidsIdx <- 1:length(rpkmFids) if (!is.null(extAOH)) { selectedFidsIdx <- which(rpkmFids %in% extAOH$Name) }
rpkmDt <- prepareRPKMData(rpkmPaths[selectedFidsIdx], rpkmFids[selectedFidsIdx], 1)
please also print length(selectedFidsIdx)
print(length(selectedFidsIdx))
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BCM-Lupskilab/HMZDelFinder/issues/7#issuecomment-379457838, or mute the thread https://github.com/notifications/unsubscribe-auth/AHL3I_O4oui9WPXGQMY1af2dmkocXgvEks5tmIzsgaJpZM4SOhe9 .
Could you please provide me a sample VCF file once again. I tried to download it but the link is inactive
Thanks
https://drive.google.com/open?id=1V5Y7GELPWBlScP2GXyve_rJyeb_dJMLc
On Fri, Apr 27, 2018 at 5:31 AM, tgambin notifications@github.com wrote:
Could you please provide me a sample VCF file once again. I tried to download it but the link is inactive
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BCM-Lupskilab/HMZDelFinder/issues/7#issuecomment-384918302, or mute the thread https://github.com/notifications/unsubscribe-auth/AHL3I5Hnno_SGmTJ5k-Wa2USr-XeTi5Fks5tsuWEgaJpZM4SOhe9 .
Hi Tomek,
Just following up.
Cheers,
Ricardo
On Fri, Apr 27, 2018 at 11:09 AM, Ricardo Harripaul < ricardo.harripaul@gmail.com> wrote:
Thanks
https://drive.google.com/open?id=1V5Y7GELPWBlScP2GXyve_rJyeb_dJMLc
On Fri, Apr 27, 2018 at 5:31 AM, tgambin notifications@github.com wrote:
Could you please provide me a sample VCF file once again. I tried to download it but the link is inactive
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BCM-Lupskilab/HMZDelFinder/issues/7#issuecomment-384918302, or mute the thread https://github.com/notifications/unsubscribe-auth/AHL3I5Hnno_SGmTJ5k-Wa2USr-XeTi5Fks5tsuWEgaJpZM4SOhe9 .
Thanks for providing the data. I will try to process the data and get back to you. Tomek
Hi Tomek,
Just checking in again.
Cheers,
Ricardo
On Wed, May 16, 2018 at 11:08 AM, tgambin notifications@github.com wrote:
Thanks for providing the data. I will try to process the data and get back to you. Tomek
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BCM-Lupskilab/HMZDelFinder/issues/7#issuecomment-389553610, or mute the thread https://github.com/notifications/unsubscribe-auth/AHL3Iye2SHAz3pJniCvXkih9ZP-Doy_uks5tzEDmgaJpZM4SOhe9 .
I found one issue in the function read.vcf.quick.noinfo which is now fixed (there was a leading space between " bzcat" ). I think I did not catch before this problem because it worked find with the previous version od data.table package.
I tested it on your VCFs and it works now. Please try and let me know if you have any other problems.
Hi Tomek,
I tried to start from scratch and downloaded a new copy of the HMZDelFinder. I am getting the same error though:
vcfFiles [1] "DMA_1.10019.vcf.bz2" "DMA_2.10018.vcf.bz2" "DMA_3.10020.vcf.bz2" [4] "DMA-1_S1.vcf.bz2" "DMA-2_S1.vcf.bz2" "DMA-3_S1.vcf.bz2" [7] "DMA-4_S1.vcf.bz2" "DMA-5_S1.vcf.bz2" "DMA-6_S1.vcf.bz2" [10] "DMA-7_S1.vcf.bz2" vcfFids [1] "DMA_1" "DMA_2" "DMA_3" "DMA-1_S1" "DMA-2_S1" "DMA-3_S1" [7] "DMA-4_S1" "DMA-5_S1" "DMA-6_S1" "DMA-7_S1" vcfPaths [1] "/home/ricardo/Downloads/DMA/ATLAS/DMA_1.10019.vcf.bz2" [2] "/home/ricardo/Downloads/DMA/ATLAS/DMA_2.10018.vcf.bz2" [3] "/home/ricardo/Downloads/DMA/ATLAS/DMA_3.10020.vcf.bz2" [4] "/home/ricardo/Downloads/DMA/ATLAS/DMA-1_S1.vcf.bz2" [5] "/home/ricardo/Downloads/DMA/ATLAS/DMA-2_S1.vcf.bz2" [6] "/home/ricardo/Downloads/DMA/ATLAS/DMA-3_S1.vcf.bz2" [7] "/home/ricardo/Downloads/DMA/ATLAS/DMA-4_S1.vcf.bz2" [8] "/home/ricardo/Downloads/DMA/ATLAS/DMA-5_S1.vcf.bz2" [9] "/home/ricardo/Downloads/DMA/ATLAS/DMA-6_S1.vcf.bz2" [10] "/home/ricardo/Downloads/DMA/ATLAS/DMA-7_S1.vcf.bz2" rpkmPaths [1] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA_1_rpkm2.txt" [2] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA_2_rpkm2.txt" [3] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA_3_rpkm2.txt" [4] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-1_S1_rpkm2.txt" [5] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-2_S1_rpkm2.txt" [6] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-3_S1_rpkm2.txt" [7] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-4_S1_rpkm2.txt" [8] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-5_S1_rpkm2.txt" [9] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-6_S1_rpkm2.txt" [10] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-7_S1_rpkm2.txt" rpkmFids [1] "DMA_1" "DMA_2" "DMA_3" "DMA-1_S1" "DMA-2_S1" "DMA-3_S1" [7] "DMA-4_S1" "DMA-5_S1" "DMA-6_S1" "DMA-7_S1" is_cmg <- FALSE # only for CMG project - otherwhise use FALSE lowRPKMthreshold <- 0.65# RPKM threshold maxFrequency <- 0.05# max frequncy of HMZ deletion; default =0.005 minAOHsize <- 1000# min AOH size minAOHsig <- 0.45# min AOH signal threshold mc.cores<-8 # number of cores vR_id<-"AD"# ID from VCF FORMAT indicating the number of variant reads, for other variant callers could be "AD" tR_id<-"DP"# ID from VCF FORMAT indicating the number total reads filter <- "." results <- runHMZDelFinder (vcfPaths, snpPaths= mc.cores= minAOHsize= tR_id= snpFids= aohRDataOut= minAOHsig= filter= rpkmPaths= bedFile= is_cmg= rpkmFids= lowRPKMthreshold= vR_id= results <- runHMZDelFinder (vcfPaths,# vcfPaths - paths to VCF files for AOH analysis (not used for 1000 genomes)
- vcfFids,# vcfFids - sample identifiers corresponding to VCF files (not used for 1000 genomes)
- rpkmPaths, # paths to RPKM files
- rpkmFids,# samples identifiers corresponding to RPKM files
- mc.cores,# number of CPU cores
- aohRDataOut,# temp file to store AOH data
- bedFile,# bed file with target
- lowRPKMthreshold, # RPKM threshold
- minAOHsize, # min AOH size
- minAOHsig,# min AOH signal threshold
- is_cmg,# flag used for CMG specific annotations; TRUE samples are from BHCMG cohort, FALSE otherwhise
- vR_id, # ID for 'the number of variants reads' in VCF FORMAT column (default='VR');
- tR_id,# ID for 'the number of total reads' in VCF FORMAT column (default='DP')
- filter)
| | | | \/ | / \ | | () | | _ | || | |\/| | / /| | | |/ \ | | | | ' \ / ` |/ \ '| | | | | |/ /| |_| | / | | | | | | | (| | / | || ||| |/|/ _||| ||| ||_,|\|_|
[1] "[step 1 out of 7] ** AOH data **" [1] "Reading VCFs and generating AOH data using CBS ..." | ====================================================================== | 100% [1] "Extending AOHs by length of uncertain regions ..." | ====================================================================== | 100% [1] "[step 2 out of 7] ** Preparing RPKM data **" [1] "Reading RPKM files ..." |
---|
0%Error in if (file.info(file)$size == 0) { : missing value where TRUE/FALSE needed In addition: There were 50 or more warnings (use warnings() to see the first 50) | | 0%
On Fri, Jun 15, 2018 at 11:33 AM tgambin notifications@github.com wrote:
I found one issue in the function read.vcf.quick.noinfo which is now fixed (there was a leading space between " bzcat" ). I think I did not catch before this problem because it worked find with the previous version od data.table package.
I tested it on your VCFs and it works now. Please try and let me know if you have any other problems.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BCM-Lupskilab/HMZDelFinder/issues/7#issuecomment-397659119, or mute the thread https://github.com/notifications/unsubscribe-auth/AHL3I147lU8XTGHlaeTh_URgTc_wiC3tks5t89PegaJpZM4SOhe9 .
Hi Tomek,
I was wondering if you received my last email. I thought the software was updated and I tried running the exome filtering and it crashed as before. Am I suppose to modify the code?
Thanks,
Ricardo
On Fri, Jun 29, 2018 at 2:08 PM Ricardo Harripaul < ricardo.harripaul@gmail.com> wrote:
Hi Tomek,
I tried to start from scratch and downloaded a new copy of the HMZDelFinder. I am getting the same error though:
vcfFiles [1] "DMA_1.10019.vcf.bz2" "DMA_2.10018.vcf.bz2" "DMA_3.10020.vcf.bz2" [4] "DMA-1_S1.vcf.bz2" "DMA-2_S1.vcf.bz2" "DMA-3_S1.vcf.bz2" [7] "DMA-4_S1.vcf.bz2" "DMA-5_S1.vcf.bz2" "DMA-6_S1.vcf.bz2" [10] "DMA-7_S1.vcf.bz2" vcfFids [1] "DMA_1" "DMA_2" "DMA_3" "DMA-1_S1" "DMA-2_S1" "DMA-3_S1" [7] "DMA-4_S1" "DMA-5_S1" "DMA-6_S1" "DMA-7_S1" vcfPaths [1] "/home/ricardo/Downloads/DMA/ATLAS/DMA_1.10019.vcf.bz2" [2] "/home/ricardo/Downloads/DMA/ATLAS/DMA_2.10018.vcf.bz2" [3] "/home/ricardo/Downloads/DMA/ATLAS/DMA_3.10020.vcf.bz2" [4] "/home/ricardo/Downloads/DMA/ATLAS/DMA-1_S1.vcf.bz2" [5] "/home/ricardo/Downloads/DMA/ATLAS/DMA-2_S1.vcf.bz2" [6] "/home/ricardo/Downloads/DMA/ATLAS/DMA-3_S1.vcf.bz2" [7] "/home/ricardo/Downloads/DMA/ATLAS/DMA-4_S1.vcf.bz2" [8] "/home/ricardo/Downloads/DMA/ATLAS/DMA-5_S1.vcf.bz2" [9] "/home/ricardo/Downloads/DMA/ATLAS/DMA-6_S1.vcf.bz2" [10] "/home/ricardo/Downloads/DMA/ATLAS/DMA-7_S1.vcf.bz2" rpkmPaths [1] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA_1_rpkm2.txt" [2] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA_2_rpkm2.txt" [3] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA_3_rpkm2.txt" [4] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-1_S1_rpkm2.txt" [5] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-2_S1_rpkm2.txt" [6] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-3_S1_rpkm2.txt" [7] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-4_S1_rpkm2.txt" [8] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-5_S1_rpkm2.txt" [9] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-6_S1_rpkm2.txt" [10] "/home/ricardo/Downloads/DMA/HMZDelFinder/data/TGP/DMA-7_S1_rpkm2.txt" rpkmFids [1] "DMA_1" "DMA_2" "DMA_3" "DMA-1_S1" "DMA-2_S1" "DMA-3_S1" [7] "DMA-4_S1" "DMA-5_S1" "DMA-6_S1" "DMA-7_S1" is_cmg <- FALSE # only for CMG project - otherwhise use FALSE lowRPKMthreshold <- 0.65# RPKM threshold maxFrequency <- 0.05# max frequncy of HMZ deletion; default =0.005 minAOHsize <- 1000# min AOH size minAOHsig <- 0.45# min AOH signal threshold mc.cores<-8 # number of cores vR_id<-"AD"# ID from VCF FORMAT indicating the number of variant reads, for other variant callers could be "AD" tR_id<-"DP"# ID from VCF FORMAT indicating the number total reads filter <- "." results <- runHMZDelFinder (vcfPaths, snpPaths= mc.cores= minAOHsize= tR_id= snpFids= aohRDataOut= minAOHsig= filter= rpkmPaths= bedFile= is_cmg= rpkmFids= lowRPKMthreshold= vR_id= results <- runHMZDelFinder (vcfPaths,# vcfPaths - paths to VCF files for AOH analysis (not used for 1000 genomes)
- vcfFids,# vcfFids - sample identifiers corresponding to VCF files (not used for 1000 genomes)
- rpkmPaths, # paths to RPKM files
- rpkmFids,# samples identifiers corresponding to RPKM files
- mc.cores,# number of CPU cores
- aohRDataOut,# temp file to store AOH data
- bedFile,# bed file with target
- lowRPKMthreshold, # RPKM threshold
- minAOHsize, # min AOH size
- minAOHsig,# min AOH signal threshold
- is_cmg,# flag used for CMG specific annotations; TRUE samples are from BHCMG cohort, FALSE otherwhise
- vR_id, # ID for 'the number of variants reads' in VCF FORMAT column (default='VR');
- tR_id,# ID for 'the number of total reads' in VCF FORMAT column (default='DP')
- filter)
| | | | \/ | / \ | | () | | _ | || | |\/| | / /| | | |/ \ | | | | ' \ / ` |/ \ '| | | | | |/ /| |_| | / | | | | | | | (| | / | || ||| |/|/ _||| ||| ||_,|\|_|
[1] "[step 1 out of 7] ** AOH data **" [1] "Reading VCFs and generating AOH data using CBS ..." ====================================================================== 100% [1] "Extending AOHs by length of uncertain regions ..." ====================================================================== 100% [1] "[step 2 out of 7] ** Preparing RPKM data **" [1] "Reading RPKM files ..." 0%Error in if (file.info(file)$size == 0) { : missing value where TRUE/FALSE needed In addition: There were 50 or more warnings (use warnings() to see the first 50) | | 0%
On Fri, Jun 15, 2018 at 11:33 AM tgambin notifications@github.com wrote:
I found one issue in the function read.vcf.quick.noinfo which is now fixed (there was a leading space between " bzcat" ). I think I did not catch before this problem because it worked find with the previous version od data.table package.
I tested it on your VCFs and it works now. Please try and let me know if you have any other problems.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BCM-Lupskilab/HMZDelFinder/issues/7#issuecomment-397659119, or mute the thread https://github.com/notifications/unsubscribe-auth/AHL3I147lU8XTGHlaeTh_URgTc_wiC3tks5t89PegaJpZM4SOhe9 .
Could you please check extAOH object like you did it before ? Is it still empty ?
Hi Tomek,
It seems it does not exist.
results <- runHMZDelFinder (vcfPaths,# vcfPaths - paths to VCF files for AOH analysis (not used for 1000 genomes)
- vcfFids,# vcfFids - sample identifiers corresponding to VCF files (not used for 1000 genomes)
- rpkmPaths, # paths to RPKM files
- rpkmFids,# samples identifiers corresponding to RPKM files
- mc.cores,# number of CPU cores
- aohRDataOut,# temp file to store AOH data
- bedFile,# bed file with target
- lowRPKMthreshold, # RPKM threshold
- minAOHsize, # min AOH size
- minAOHsig,# min AOH signal threshold
- is_cmg,# flag used for CMG specific annotations; TRUE samples are from BHCMG cohort, FALSE otherwhise
- vR_id, # ID for 'the number of variants reads' in VCF FORMAT column (default='VR');
- tR_id,# ID for 'the number of total reads' in VCF FORMAT column (default='DP')
- filter)
| | | | \/ | / \ | | () | | _ | || | |\/| | / /| | | |/ \ | | | | ' \ / ` |/ \ '| | | | | |/ /| |_| | / | | | | | | | (| | / | || ||| |/|/ _||| ||| ||_,|\|_|
[1] "[step 1 out of 7] ** AOH data **" [1] "Reading VCFs and generating AOH data using CBS ..." | ====================================================================== | 100% [1] "Extending AOHs by length of uncertain regions ..." | ====================================================================== | 100% [1] "[step 2 out of 7] ** Preparing RPKM data **" [1] "Reading RPKM files ..." |
---|
0%Error in if (file.info(file)$size == 0) { : missing value where TRUE/FALSE needed In addition: There were 22 warnings (use warnings() to see them) ^C
library(gdata) library(data.table) library(GenomicRanges) library(parallel) library(matrixStats)
extAOH <- NULL if (is.null(snpPaths) || is.null(snpFids)){print("Skipping AOH analysis ...")} Error: object 'snpPaths' not found else{ Error: unexpected 'else' in "else" library(DNAcopy) extAOH <- prepareAOHData(snpPaths, snpFids, mc.cores, vR_id, tR_id, filter) [1] "Reading VCFs and generating AOH data using CBS ..." Error in mclapply2(1:length(fileNames), function(i) { : object 'snpPaths' not found save(extAOH, file=aohRDataOut) } Error: unexpected '}' in "}"
print("[step 2 out of 7] ** Preparing RPKM data **") [1] "[step 2 out of 7] ** Preparing RPKM data **" selectedFidsIdx <- 1:length(rpkmFids) if (!is.null(extAOH)) {
- selectedFidsIdx <- which(rpkmFids %in% extAOH$Name)
- }
rpkmDt <- prepareRPKMData(rpkmPaths[selectedFidsIdx], rpkmFids[selectedFidsIdx], 1) [1] "Reading RPKM files ..." |======================================================================| 100% [1] "Removing empty elements ..." [1] "Creating matrix ..." |======================================================================| 100% [1] "Creating data.table (may take a while)..." Warning messages: 1: In selectChildren(pids[!fin], -1) : cannot wait for child 19856 as it does not exist 2: In selectChildren(pids[!fin], -1) : cannot wait for child 19856 as it does not exist
please also print length(selectedFidsIdx)
print(length(selectedFidsIdx)) [1] 10
On Fri, Jul 20, 2018 at 6:36 AM tgambin notifications@github.com wrote:
Could you please check extAOH object like you did it before ? Is it still empty ?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BCM-Lupskilab/HMZDelFinder/issues/7#issuecomment-406562224, or mute the thread https://github.com/notifications/unsubscribe-auth/AHL3I1iYmjKfZLIgNQGX3FnDbohK7TeKks5uIbKzgaJpZM4SOhe9 .
I notice this in your script: newAOH <- rbindlist(lapply(c(1:22,"X","Y"), function(chr){
some vcf file use “chr1”,“chr2” but not “1”,“2”,“X”,“Y”
suggest using “unique(tmpAOH$chrom)”
Hi Tomek,
Any input on this. i believe you said you were able to get it running previously.
Cheers,
Ricardo
On Fri, Jul 20, 2018 at 6:36 AM tgambin notifications@github.com wrote:
Could you please check extAOH object like you did it before ? Is it still empty ?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BCM-Lupskilab/HMZDelFinder/issues/7#issuecomment-406562224, or mute the thread https://github.com/notifications/unsubscribe-auth/AHL3I1iYmjKfZLIgNQGX3FnDbohK7TeKks5uIbKzgaJpZM4SOhe9 .
Let me check it again and I will let you know tomorrow.
Ok,
Please try again. I fixed potential problem with "chr" prefix. In addition please, make sure that files are compressed with bzip2 (I had to uncompress and run bzip2 again on your files).
Please, first try to run the following command and check if it works: dt <- prepareAOHData ("DMA_1.100019.vcf.bz2", "DMA_1", 1, "AD", "DP" , ".") dim(dt)
Best, Tomek
Hi Tomek,
Just touching base again.
Cheers,
Ricardo
On Thu, Aug 2, 2018 at 7:03 AM tgambin notifications@github.com wrote:
Ok,
Please try again. I fixed potential problem with "chr" prefix. In addition please, make sure that files are compressed with bzip2 (I had to uncompress and run bzip2 again on your files).
Please, first try to run the following command and check if it works: dt <- prepareAOHData ("DMA_1.100019.vcf.bz2", "DMA_1", 1, "AD", "DP" , ".") dim(dt)
[1] 2838 10
Best, Tomek
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BCM-Lupskilab/HMZDelFinder/issues/7#issuecomment-409889532, or mute the thread https://github.com/notifications/unsubscribe-auth/AHL3IyMt_fjUjEAI9Fs0psmSpQ0in2-yks5uMtxkgaJpZM4SOhe9 .
Have you seen my latest comment ? Please, see below
Please try again. I fixed potential problem with "chr" prefix. In addition please, make sure that files are compressed with bzip2 (I had to uncompress and run bzip2 again on your files).
Please, first try to run the following command and check if it works: dt <- prepareAOHData ("DMA_1.100019.vcf.bz2", "DMA_1", 1, "AD", "DP" , ".") dim(dt)
Hi,
I received an error from HMZDelFinder while filtering for AOH.I called variants using GATK Best Practices and zipped the files using bzip2 filename or bzip2 -z filename. All VCF files are in v4.2. Files extensions are .bz2. I receive the error listed below but interestingly enough I can bzcat the file paths from the command line with no problem.Can you please help me solve this issue?
[1] "[step 1 out of 7] ** AOH data **"[1] "Reading VCFs and generating AOH data using CBS ..." | | 0%bzcat: I/O or other error, bailing out. Possible reason follows.bzcat: Broken pipe Input file = /home/ricardo/Downloads/DMA/ATLAS/DMA_1.10019.vcf.bz2, output file = (stdout)bzcat: I/O or other error, bailing out. Possible reason follows.bzcat: Broken pipe Input file = /home/ricardo/Downloads/DMA/ATLAS/DMA-1_S1.vcf.bz2, output file = (stdout)bzcat: I/O or other error, bailing out. Possible reason follows.bzcat: Broken pipe Input file = /home/ricardo/Downloads/DMA/ATLAS/DMA-2_S1.vcf.bz2, output file = (stdout)bzcat: I/O or other error, bailing out. Possible reason follows.bzcat: Broken pipe Input file = /home/ricardo/Downloads/DMA/ATLAS/DMA_2.10018.vcf.bz2, output file = (stdout)bzcat: I/O or other error, bailing out. Possible reason follows.bzcat: Broken pipe Input file = /home/ricardo/Downloads/DMA/ATLAS/DMA_3.10020.vcf.bz2, output file = (stdout)bzcat: I/O or other error, bailing out. Possible reason follows.bzcat: Broken pipe Input file = /home/ricardo/Downloads/DMA/ATLAS/DMA-3_S1.vcf.bz2, output file = (stdout)bzcat: I/O or other error, bailing out. Possible reason follows.bzcat: Broken pipe Input file = /home/ricardo/Downloads/DMA/ATLAS/DMA-5_S1.vcf.bz2, output file = (stdout)bzcat: I/O or other error, bailing out. Possible reason follows.bzcat: Broken pipe Input file = /home/ricardo/Downloads/DMA/ATLAS/DMA-4_S1.vcf.bz2, output file = (stdout)bzcat: I/O or other error, bailing out. Possible reason follows.bzcat: Broken pipe Input file = /home/ricardo/Downloads/DMA/ATLAS/DMA-6_S1.vcf.bz2, output file = (stdout)bzcat: I/O or other error, bailing out. Possible reason follows.bzcat: Broken pipe Input file = /home/ricardo/Downloads/DMA/ATLAS/DMA-7_S1.vcf.bz2, output file = (stdout)