VanLoo-lab / ascat

ASCAT R package
https://www.mdanderson.org/research/departments-labs-institutes/labs/van-loo-laboratory/resources.html#ASCAT
162 stars 85 forks source link

non-human genome #136

Closed gbnci closed 1 year ago

gbnci commented 1 year ago

Hello: Just wondering whether I can run ASCAT on non-human sample WGS sequencing data (for instance canine)? I have managed to download the snp data information from online database, I think we can also generate GC content information (although have not tried yet). I have match tumor and normal pair. Not sure how to handle sex chromosomes though (or just use autosomes?). A lot of unknown for me. Any suggestions would be really helpful. Thanks.

tlesluyes commented 1 year ago

Hi @gbnci,

This would definitely be possible but requires some tweaks with our current ASCAT version. For now, HTS data needs to be genome-based: either hg19 or hg38. This is because:

  1. The logR processing (T/N ratio) puts both autosomes and nonPAR at 0, although autosomes are in 2 copies whereas nonPAR is only 1 copy in males. Therefore, it needs to be set to -1 (because gamma=1 for HTS data) here (lines 147-157).
  2. Since ASCAT leverages heterozygous SNPs, nonPAR won't have any. We need to artificially rescue some homozygous SNPs so BAF gets segmented at 0/1 bands here (lines 52-71).

Because we don't have a methodology in place (yet, there should be something in a few months), here is a workaround solution:

  1. Use the ascat.prepareHTS function to derive logR and BAF from HTS, but set gender='XX' for all samples, including males. Once logR and BAF files have been generated, read the logR file and, only for males, set logR=logR-1 for SNPs in your nonPAR region (depending on your reference) and save the updated logR so the file can be fed in ascat.loadData.
  2. Use the ascat.loadData but don't provide any value for genomeVersion (it'll be NULL by default). Once the ascat.bc object is created and before any other ASCAT command, use ascat.bc$X_nonPAR=c(VAL1,VAL2) with VAL1 and VAL2 being the start and end position of the nonPAR region on X (if it's not on X, it would require a more complex fix).

Otherwise, a simpler but dirtier (hence not recommended) solution is to just ignore chromosome X (chrom_names=1:22 in ascat.prepareHTS), this would only work if nonPAR is located on X though.

We'll try to implement something in the future so nonPAR can be customised but can't make any promise on a deadline I'm afraid. Stay tuned!

Cheers,

Tom.

gbnci commented 1 year ago

Thanks, Tom: I will give it a try and will definitely ask you for further help in the future.

From: tlesluyes @.> Reply-To: VanLoo-lab/ascat @.> Date: Friday, March 24, 2023 at 7:11 AM To: VanLoo-lab/ascat @.> Cc: "Wang, Yonghong (NIH/NCI) [E]" @.>, Mention @.***> Subject: [EXTERNAL] Re: [VanLoo-lab/ascat] non-human genome (Issue #136)

Hi @gbncihttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fgbnci&data=05%7C01%7Cwangyong%40mail.nih.gov%7C0c417262701e42ff7f5408db2c588946%7C14b77578977342d58507251ca2dc2b06%7C0%7C0%7C638152530992274446%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=1gLiCgGzfpE%2FV9M2VEUtk%2FF7JQL3MNIgzVgdtkkELa8%3D&reserved=0,

This would definitely be possible but requires some tweaks with our current ASCAT version. For now, HTS data needs to be genome-based: either hg19 or hg38. This is because:

  1. The logR processing (T/N ratio) puts both autosomes and nonPAR at 0, although autosomes are in 2 copies whereas nonPAR is only 1 copy in males. Therefore, it needs to be set to -1 (because gamma=1 for HTS data) here (lines 147-157)https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FVanLoo-lab%2Fascat%2Fblob%2Fmaster%2FASCAT%2FR%2Fascat.prepareHTS.R%23L147&data=05%7C01%7Cwangyong%40mail.nih.gov%7C0c417262701e42ff7f5408db2c588946%7C14b77578977342d58507251ca2dc2b06%7C0%7C0%7C638152530992274446%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=m5BXmNP7S8kwFOxyWwulHUawdRG5SfodGZ8%2Ft38UH%2FY%3D&reserved=0.
  2. Since ASCAT leverages heterozygous SNPs, nonPAR won't have any. We need to artificially rescue some homozygous SNPs so BAF gets segmented at 0/1 bands here (lines 52-71)https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FVanLoo-lab%2Fascat%2Fblob%2Fmaster%2FASCAT%2FR%2Fascat.aspcf.R%23L52&data=05%7C01%7Cwangyong%40mail.nih.gov%7C0c417262701e42ff7f5408db2c588946%7C14b77578977342d58507251ca2dc2b06%7C0%7C0%7C638152530992274446%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=vMp98LiUsHjzRUU3HWrhrAj7vEny2UpJU0B25kaFfJI%3D&reserved=0.

Because we don't have a methodology in place (yet, there should be something in a few months), here is a workaround solution:

  1. Use the ascat.prepareHTS function to derive logR and BAF from HTS, but set gender='XX' for all samples, including males. Once logR and BAF files have been generated, read the logR file and, only for males, set logR=logR-1 for SNPs in your nonPAR region (depending on your reference) and save the updated logR so the file can be fed in ascat.loadData.
  2. Use the ascat.loadData but don't provide any value for genomeVersion (it'll be NULL by default). Once the ascat.bc object is created and before any other ASCAT command, use ascat.bc$X_nonPAR=c(VAL1,VAL2) with VAL1 and VAL2 being the start and end position of the nonPAR region on X (if it's not on X, it would require a more complex fix).

Otherwise, a simpler but dirtier (hence not recommended) solution is to just ignore chromosome X (chrom_names=1:22 in ascat.prepareHTS), this would only work if nonPAR is located on X though.

We'll try to implement something in the future so nonPAR can be customised but can't make any promise on a deadline I'm afraid. Stay tuned!

Cheers,

Tom.

— Reply to this email directly, view it on GitHubhttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FVanLoo-lab%2Fascat%2Fissues%2F136%23issuecomment-1482634504&data=05%7C01%7Cwangyong%40mail.nih.gov%7C0c417262701e42ff7f5408db2c588946%7C14b77578977342d58507251ca2dc2b06%7C0%7C0%7C638152530992274446%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=8X5JePeTyjx%2FaoGlFQjPlK589Kd6I0rPbxU9lRCc3wA%3D&reserved=0, or unsubscribehttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAI62R6AEODBLC2WYSS6A3ILW5V6OPANCNFSM6AAAAAAWEDC2HI&data=05%7C01%7Cwangyong%40mail.nih.gov%7C0c417262701e42ff7f5408db2c588946%7C14b77578977342d58507251ca2dc2b06%7C0%7C0%7C638152530992274446%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=vSSiAbPVMVMjw6PhIh2Xb%2Bq4fY8WtlguJEUenKZJtMA%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.***> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and are confident the content is safe.

tlesluyes commented 1 year ago

Closing this issue for now.