broadinstitute / ichorCNA

Estimating tumor fraction in cell-free DNA from ultra-low-pass whole genome sequencing.
GNU General Public License v3.0
158 stars 88 forks source link

Support for nonstandard genomes / species #87

Open EvoMedLab opened 3 years ago

EvoMedLab commented 3 years ago

Hello,

I am looking at cancer in canines and felines, and I run into the issue:

Error in keepSeqlevels(seqinfo, value = chrs) : invalid seqlevels: 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38 Calls: getSeqInfo -> keepSeqlevels In addition: Warning message: In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, : there is no package called ‘BSgenome.Hsapiens.UCSC.hg19’

I have generated tumor and gc wig files, along with centromere tsv, and am simply trying to test. What is required to use non-human / non-mouse species?

-Brian

gavinha commented 3 years ago

Hi @AgaricX

We have built-in a check on the chromosomes of the input data to be specifically human. As you can see from BSgenome.Hsapiends.UCSC.hg19 corresponds to Homo sapiens.

You can try to set seqinfo <- NULL in line: https://github.com/broadinstitute/ichorCNA/blob/5bfc03ed854f0e93fe5b624c97c1290fa0053837/scripts/runIchorCNA.R#L128

In earlier versions of ichorCNA (https://github.com/broadinstitute/ichorCNA/releases), there are fewer checks for reference annotations and chromosomes so you could try that as well.

There may be other modifications you'd have to make. I know that others have used ichorCNA for other species so it is definitely possible after some modification.

Hope this helps. Best, Gavin

mheskett commented 2 years ago

Sorry I don't mean to pester, but I am just adding a vote for non-human species mainly mouse.

EvoMedLab commented 2 years ago

Hello, I have been unable to make this work with nonhuman genomes. Can we be directed to some that have?

mheskett commented 2 years ago

I got it to work kinda. Would be awesome if the creator added legitimate support for mouse, but i was able to hack it together if you want to email me

On Fri, Oct 15, 2021, 3:13 PM AgaricX @.***> wrote:

Hello, I have been unable to make this work with nonhuman genomes. Can we be directed to some that have?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/broadinstitute/ichorCNA/issues/87#issuecomment-944762832, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACUDDNUIGETV6UF3SAW3SILUHCRQDANCNFSM4UUAYKAQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

mheskett commented 2 years ago

Rscript /home/groups/Spellmandata/heskett/english/ichorCNA/scripts/runIchorCNA.R --id $filename \ --WIG $sample_wig --gcWig ../mm10.reference/refdata-gex-mm10-2020-A/fasta/genome.gc.wig \ --chrs "c(\"chr1\",\"chr2\",\"chr3\",\"chr4\",\"chr5\",\"chr6\",\"chr7\",\"chr8\",\"chr9\",\"chr10\",\"chr11\",\"chr12\",\"chr13\",\"chr14\",\"chr15\",\"chr16\",\"chr17\",\"chr18\",\"chr19\",\"chrX\",\"chrY\")" \ --chrNormalize "c(\"chr1\",\"chr2\",\"chr3\",\"chr4\",\"chr5\",\"chr6\",\"chr7\",\"chr8\",\"chr9\",\"chr10\",\"chr11\",\"chr12\",\"chr13\",\"chr14\",\"chr15\",\"chr16\",\"chr17\",\"chr18\",\"chr19\",\"chrX\",\"chrY\")" \ --mapWig ../mm10.reference/refdata-gex-mm10-2020-A/fasta/genome2.50000.window.wig \ --ploidy "c(2)" --normal "c(0.9)" --maxCN 8 --includeHOMD False --estimateNormal True --estimatePloidy True \ --estimateScPrevalence True --centromere /home/groups/Spellmandata/heskett/english/mm10.reference/mm10.centromeres.txt \ --genomeBuild mm10 \ --txnE 0.9999 --txnStrength 10000 --fracReadsInChrYForMale 0.001 --plotFileType png --plotYLim "c(-2,4)" --outDir $out_dir

So this call worked on mouse, but my question is that it may be using the human panel of normals by default leading to spurious calls? Can any developers comment?

moldovannorbert commented 1 year ago

So a small help for those, who still want to do this with other genomes. First of all, many thanks for Michael Heskett's help with this!

  1. Download your genome of choice.
  2. Download and install hmmcopy_utils.
  3. Create a Genome reference mappabilty file.
  4. Use the Genome reference mappabilty file to create the mappability- and GC counts.
  5. Copy these .wig files to the same location where you have your human versions of these files, in my case /share/r-ichorcna-0.3.2-1/extdata/ and change the paths for these in the config file ichorCNA_gcWig and ichorCNA_mapWig.
  6. If you have normals, create a panel of normals, if not you can leave ichorCNA_normalPanel empty in the config.
  7. If you have centromere locations add it to extdata, if not, you can leave ichorCNA_centromere empty in the config.
  8. Modify the chrs, readDepth_chrs so they contain the chromosome names from your genome.
  9. Modify the ichorCNA_chrTrain and ichorCNA_chrs so they have the correct chromosome number.
  10. Modify the source as gavinha suggested it:

You can try to set seqinfo <- NULL in line:

https://github.com/broadinstitute/ichorCNA/blob/5bfc03ed854f0e93fe5b624c97c1290fa0053837/scripts/runIchorCNA.R#L128

And that's it. If all goes well this should make ichorCNA work with non-human standard format genomes.

mheskett commented 1 year ago

Excellent. You may want to see if they will put this guide into the main GitHub page instead of in an issue here

On Mon, Aug 29, 2022 at 7:06 AM moldovannorbert @.***> wrote:

So a small help for those, who still want to do this with other genomes. First of all, many thanks for Michael Heskett https://github.com/mheskett's help with this!

  1. Download your genome of choice.
  2. Download and install hmmcopy_utils https://github.com/shahcompbio/hmmcopy_utils.
  3. Create a Genome reference mappabilty file https://github.com/shahcompbio/hmmcopy_utils#generating-the-genome-reference-mappability-file .
  4. Use the Genome reference mappabilty file to create the mappability- and GC counts https://github.com/shahcompbio/hmmcopy_utils#usage.
  5. Copy these .wig files to the same location where you have your human versions of these files, in my case /share/r-ichorcna-0.3.2-1/extdata/ and change the paths for these in the config file ichorCNA_gcWig and ichorCNA_mapWig.
  6. If you have normals, create a panel of normals, if not you can leave ichorCNA_normalPanel empty in the config.
  7. If you have centromere locations add it to extdata, if not, you can leave ichorCNA_centromere empty in the config.
  8. Modify the chrs, readDepth_chrs so they contain the chromosome names from your genome.
  9. Modify the ichorCNA_chrTrain and ichorCNA_chrs so they have the correct chromosome number.
  10. Modify the source as gavinha https://github.com/gavinha suggested it:

You can try to set seqinfo <- NULL in line:

https://github.com/broadinstitute/ichorCNA/blob/5bfc03ed854f0e93fe5b624c97c1290fa0053837/scripts/runIchorCNA.R#L128

And that's it. If all goes well this should make ichorCNA work with non-human standard format genomes.

— Reply to this email directly, view it on GitHub https://github.com/broadinstitute/ichorCNA/issues/87#issuecomment-1230356518, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACUDDNQJXLT36N3WRJUIIELV3S7VHANCNFSM4UUAYKAQ . You are receiving this because you commented.Message ID: @.***>

mheskett commented 1 year ago

So a small help for those, who still want to do this with other genomes. First of all, many thanks for Michael Heskett's help with this!

  1. Download your genome of choice.
  2. Download and install hmmcopy_utils.
  3. Create a Genome reference mappabilty file.
  4. Use the Genome reference mappabilty file to create the mappability- and GC counts.
  5. Copy these .wig files to the same location where you have your human versions of these files, in my case /share/r-ichorcna-0.3.2-1/extdata/ and change the paths for these in the config file ichorCNA_gcWig and ichorCNA_mapWig.
  6. If you have normals, create a panel of normals, if not you can leave ichorCNA_normalPanel empty in the config.
  7. If you have centromere locations add it to extdata, if not, you can leave ichorCNA_centromere empty in the config.
  8. Modify the chrs, readDepth_chrs so they contain the chromosome names from your genome.
  9. Modify the ichorCNA_chrTrain and ichorCNA_chrs so they have the correct chromosome number.
  10. Modify the source as gavinha suggested it:

You can try to set seqinfo <- NULL in line: https://github.com/broadinstitute/ichorCNA/blob/5bfc03ed854f0e93fe5b624c97c1290fa0053837/scripts/runIchorCNA.R#L128

And that's it. If all goes well this should make ichorCNA work with non-human standard format genomes.

Hey did you have any trouble making the mappability file with generateMap.pl ? I am finding it runs for a very very long time with no log messages