BoevaLab / FREEC

Control-FREEC: Copy number and genotype annotation in whole genome and whole exome sequencing data
153 stars 49 forks source link

Issue running FREEC on fungal genome #110

Closed tania-k closed 2 years ago

tania-k commented 2 years ago

Hello Boeva Lab! Appreciate the availability of this tool and am excited to see the results. I have been having an issue with running the code and am looking for any guidance you can provide!

My log file looks like:

Loading samtools/1.14
  Loading requirement: libdeflate/1.10
Control-FREEC v11.6 : a method for automatic detection of copy number alterations, subclones and for accurate estimation of contamination and main ploidy using deep-sequencing data
Non Multi-threading mode
..Breakpoint threshold for segmentation of copy number profiles is 0.05
..telocenromeric set to 50000
..FREEC is not going to adjust profiles for a possible contamination by normal cells
..Window = 50000 was set
..Output directory: Attempt1
..Directory with files containing chromosome sequences: GENOMES/
..Sample file:  aln/Ex15.sort.bam
..Sample input format:  BAM
..will use this instance of samtools: 'samtools' to read BAM files
..minimal expected GC-content (general parameter "minExpectedGC") was set to 0.45
..maximal expected GC-content (general parameter "maxExpectedGC") was set to 0.55
..Polynomial degree for "ReadCount ~ GC-content" normalization is 3 or 4: will try both
..Minimal CNA length (in windows) is 1
..File with chromosome lengths: test/res_Exophiala_dermatitidis_Ex4.fa
..Using the default minimal mappability value of 0.85
..uniqueMatch = FALSE
..average ploidy set to 1
..break-point type set to 2
..noisyData set to 0
..Control-FREEC will not look for subclones
..File test/res_Exophiala_dermatitidis_Ex4.fa was read
..[genomecopynumber] Starting reading aln/Ex15.sort.bam
..samtools should be installed to be able to read BAM files; will use the following command for samtools: samtools view -@ 1 aln/Ex15.sort.bam
..finished reading aln/Ex15.sort.bam
PROFILING [tid=140054545385280]: aln/Ex15.sort.bam read in 22 seconds [fillMyHash]
11351239 lines read..
0 reads used to compute copy number profile

Error: FREEC was not able to extract reads from aln/Ex15.sort.bam

Check your parameters: inputFormat and mateOrientation
Use "matesOrientation=0" if you have single end reads
Check the list of possible input formats at http://bioinfo-out.curie.fr/projects/freec/tutorial.html#CONFIG

While the config file looks like:

#For more options see: http://boevalab.com/FREEC/tutorial.html#CONFIG ###

[general]
#parameters chrLenFile and ploidy are required.
chrLenFile = test/res_Exophiala_dermatitidis_Ex4.fa
ploidy = 1

#Parameter "breakPointThreshold" specifies the maximal slope of the slope of residual sum of squares. 
#This should be a positive value. The closer it is to Zero, the more breakpoints will be called. Its recommended value is between 0.01 and 0.08.
breakPointThreshold = .05

#Either coefficientOfVariation or window must be specified for whole genome sequencing data. Set window=0 for exome sequencing data.
#coefficientOfVariation = 0.01
window = 50000
#step=10000

#Either chrFiles or GCcontentProfile must be specified too if no control dataset is available. 
#If you provide a path to chromosome files, Control-FREEC will look for the following fasta files in your directory (in this order): 
#1, 1.fa, 1.fasta, chr1.fa, chr1.fasta; 2, 2.fa, etc.
#csplit -s -z /path/to/INPUT.FA '/>/' '{*}'
#Please ensure that you don't have other files but sequences having the listed names in this directory. 
chrFiles = GENOMES/
#GCcontentProfile = test/GC_profile_50kb.cnp

#if you are working with something non-human, we may need to modify these parameters:
minExpectedGC = 0.45
maxExpectedGC = 0.55
#readCountThreshold=10
#numberOfProcesses = 4
outputDir = Attempt1
#contaminationAdjustment = TRUE
#contamination = 0.4
#minMappabilityPerWindow = 0.95

#If the parameter gemMappabilityFile is not specified, then the fraction of non-N nucleotides per window is used as Mappability.
#gemMappabilityFile = /GEM_mappability/out76.gem
#breakPointType = 4
#forceGCcontentNormalization = 0
#sex=XY
#set BedGraphOutput=TRUE if you want to create a BedGraph track for visualization in the UCSC genome browser:
BedGraphOutput=TRUE

[sample]
mateFile = aln/Ex15.sort.bam
#mateCopyNumberFile = test/sample.cpn
inputFormat = BAM
mateOrientation=0
#use "mateOrientation=0" for sorted .SAM and .BAM

[control]
#mateFile = /path/control.pileup.gz
#mateCopyNumberFile = path/control.cpn
#inputFormat = pileup
#mateOrientation = RF

#[BAF]
#use the following options to calculate B allele frequency profiles and genotype status. This option can only be used if "inputFormat=pileup"
#SNPfile = /bioinfo/users/vboeva/Desktop/annotations/hg19_snp131.SingleDiNucl.1based.txt
#minimalCoveragePerPosition = 5
#use "minimalQualityPerPosition" and "shiftInQuality" to consider only high quality position in calculation of allelic frequencies (this option significantly slows down reading of .pileup)
#minimalQualityPerPosition = 5
#shiftInQuality = 33

[target]
#use a tab-delimited .BED file to specify capture regions (control dataset is needed to use this option):
#captureRegions = /bioinfo/users/vboeva/Desktop/testChr19/capture.bed
valeu commented 2 years ago

test/res_Exophiala_dermatitidis_Ex4.fa should be test/res_Exophiala_dermatitidis_Ex4.fa.fai, no?

tania-k commented 2 years ago

The test/res_Exophilala_dermatitidis_Ex4 is the file containing scaffold/chromosome length information (after running your perl script).

i.e.

chr_1   4314646
chr_2   4303030
chr_3   3726496
chr_4   3672342
chr_5   3382390
chr_6   2864055
chr_7   2891896
chr_8   1200926
chr_9   251008
---

I am confused with your suggestion. I renamed my chrom length file so it doesn't end with fa. But the program still doesn't run to the end.

tania-k commented 2 years ago

Was a naming issue, my BAM, chromosome length and separate chromosome files had different names (i.e. scaffold vs chromosome) once fixed got it to run!

Thank you, and sorry about the questions.

valeu commented 2 years ago

Great! Hope you get nice results!

valeu commented 2 years ago

Dear Tania, please open a new ticket with this issue. I cannot see it on GitHub for some reason. Thank you!

From: Tania Kurbessoian @.> Sent: Sunday, July 24, 2022 6:37 To: BoevaLab/FREEC @.> Cc: Valentina Boeva @.>; State change @.> Subject: Re: [BoevaLab/FREEC] Issue running FREEC on fungal genome (Issue #110)

Hello, I'm so sorry for responding to this closed thread but I feel like I need a little guidance on the R script. Should I create a new issue?

Has this script been run using only ploidy = 1, and without the BAF portion successfully? What would the portion where I do need to change (1:22, 'X','Y') portion look like? I have 44 scaffolds (best genome we could build) and no X or Y chromosome, but the chromosomes are labeled scaffold_1, scaffold_2, etc. Any example or guidance with this would be extremely helpful!

— Reply to this email directly, view it on GitHub https://github.com/BoevaLab/FREEC/issues/110#issuecomment-1193245342 , or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNMV63LILBQ5YOWV7AQO6DVVTB57ANCNFSM54GVSDXQ . You are receiving this because you modified the open/close state. https://github.com/notifications/beacon/ACNMV67AIOIVCK5HE6DZJX3VVTB57A5CNFSM54GVSDX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOI4PXVHQ.gif Message ID: @. @.> >

tania-k commented 2 years ago

Apologies as I figured it out and deleted it! I also sent a separate email to you and your colleague but you can ignore that now. Thanks so much!

Here is my result for my black yeast, Exophiala dermatitidis, three of which have strange anomalies on chromosome 1 and 3. Maybe you could share from your experience as to what you think it could be? I had run it on mosdepth and saw the strange results and wanted to try another program to see if the similar result would appear.

Thank you, Tania Kurbessoian

On Wed, Jul 27, 2022 at 1:55 AM Valentina Boeva @.***> wrote:

Dear Tania, please open a new ticket with this issue. I cannot see it on GitHub for some reason. Thank you!

From: Tania Kurbessoian @.> Sent: Sunday, July 24, 2022 6:37 To: BoevaLab/FREEC @.> Cc: Valentina Boeva @.>; State change @.> Subject: Re: [BoevaLab/FREEC] Issue running FREEC on fungal genome (Issue

110)

Hello, I'm so sorry for responding to this closed thread but I feel like I need a little guidance on the R script. Should I create a new issue?

Has this script been run using only ploidy = 1, and without the BAF portion successfully? What would the portion where I do need to change (1:22, 'X','Y') portion look like? I have 44 scaffolds (best genome we could build) and no X or Y chromosome, but the chromosomes are labeled scaffold_1, scaffold_2, etc. Any example or guidance with this would be extremely helpful!

— Reply to this email directly, view it on GitHub < https://github.com/BoevaLab/FREEC/issues/110#issuecomment-1193245342> , or unsubscribe < https://github.com/notifications/unsubscribe-auth/ACNMV63LILBQ5YOWV7AQO6DVVTB57ANCNFSM54GVSDXQ> . You are receiving this because you modified the open/close state. < https://github.com/notifications/beacon/ACNMV67AIOIVCK5HE6DZJX3VVTB57A5CNFSM54GVSDX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOI4PXVHQ.gif> Message ID: @. @.> >

— Reply to this email directly, view it on GitHub https://github.com/BoevaLab/FREEC/issues/110#issuecomment-1196448548, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHGSR6YWVOR7EMUUJY6MPHDVWD2OLANCNFSM54GVSDXQ . You are receiving this because you authored the thread.Message ID: @.***>

-- Tania Kurbessoian M.Sc.

Ph.D. Candidate @ Stajichlab https://stajichlab.github.io/ UCR MSA https://msastudents.org/about/nominations-for-2018-2019-executive-board/ SPS Chair ICAM https://icarmenian-mycologists.github.io Co-Founder

valeu commented 2 years ago

Dear Tania, I don’t see your results attached.. If you wish you can write to @. @.>

Best

Valentina

From: Tania Kurbessoian @.> Sent: Wednesday, July 27, 2022 19:03 To: BoevaLab/FREEC @.> Cc: Valentina Boeva @.>; State change @.> Subject: Re: [BoevaLab/FREEC] Issue running FREEC on fungal genome (Issue #110)

Apologies as I figured it out and deleted it! I also sent a separate email to you and your colleague but you can ignore that now. Thanks so much!

Here is my result for my black yeast, Exophiala dermatitidis, three of which have strange anomalies on chromosome 1 and 3. Maybe you could share from your experience as to what you think it could be? I had run it on mosdepth and saw the strange results and wanted to try another program to see if the similar result would appear.

Thank you, Tania Kurbessoian

On Wed, Jul 27, 2022 at 1:55 AM Valentina Boeva @. <mailto:@.> > wrote:

Dear Tania, please open a new ticket with this issue. I cannot see it on GitHub for some reason. Thank you!

From: Tania Kurbessoian @. <mailto:@.> > Sent: Sunday, July 24, 2022 6:37 To: BoevaLab/FREEC @. <mailto:@.> > Cc: Valentina Boeva @. <mailto:@.> >; State change @. <mailto:@.> > Subject: Re: [BoevaLab/FREEC] Issue running FREEC on fungal genome (Issue

110)

Hello, I'm so sorry for responding to this closed thread but I feel like I need a little guidance on the R script. Should I create a new issue?

Has this script been run using only ploidy = 1, and without the BAF portion successfully? What would the portion where I do need to change (1:22, 'X','Y') portion look like? I have 44 scaffolds (best genome we could build) and no X or Y chromosome, but the chromosomes are labeled scaffold_1, scaffold_2, etc. Any example or guidance with this would be extremely helpful!

— Reply to this email directly, view it on GitHub < https://github.com/BoevaLab/FREEC/issues/110#issuecomment-1193245342> , or unsubscribe < https://github.com/notifications/unsubscribe-auth/ACNMV63LILBQ5YOWV7AQO6DVVTB57ANCNFSM54GVSDXQ> . You are receiving this because you modified the open/close state. < https://github.com/notifications/beacon/ACNMV67AIOIVCK5HE6DZJX3VVTB57A5CNFSM54GVSDX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOI4PXVHQ.gif> Message ID: @. <mailto:@.> @. <mailto:@.> > >

— Reply to this email directly, view it on GitHub https://github.com/BoevaLab/FREEC/issues/110#issuecomment-1196448548, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHGSR6YWVOR7EMUUJY6MPHDVWD2OLANCNFSM54GVSDXQ . You are receiving this because you authored the thread.Message ID: @. <mailto:@.> >

-- Tania Kurbessoian M.Sc.

Ph.D. Candidate @ Stajichlab https://stajichlab.github.io/ UCR MSA https://msastudents.org/about/nominations-for-2018-2019-executive-board/ SPS Chair ICAM https://icarmenian-mycologists.github.io Co-Founder

— Reply to this email directly, view it on GitHub https://github.com/BoevaLab/FREEC/issues/110#issuecomment-1197049809 , or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNMV62GSQLP3723ZJNMOJTVWFTTVANCNFSM54GVSDXQ . You are receiving this because you modified the open/close state. https://github.com/notifications/beacon/ACNMV67POA7OSAPTH2IU26LVWFTTVA5CNFSM54GVSDX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOI5MYPUI.gif Message ID: @. @.> >