LD set in hg38 ? - Githubissues

SalimMegat commented 4 years ago

Hi,

Thanks for sharing your TWAS weights ! I was wondering whether which map did you use to port LD set to hg38 coordinates ? I am also doing TWAS analysis in hg38 format but I have a lot of missing data when performing the TWAS with your weigth and my LD set updated to hg38. Would you ming sharing the updated LDset or give me a hint on which map you used for the conversion ?

I really appreciate your help !

Many thanks,

Salim.

andrewejaffe commented 4 years ago

https://github.com/LieberInstitute/brainseq_phase2/tree/master/twas/reference_hg38 Is that what you are looking for?

SalimMegat commented 4 years ago

Hi,

I guess that's the code to update the positions but we do not have access to the file "BrainSeq_Phase2_RiboZero_Genotypes_n551.rda" So what I was looking for would be more like a map that would look like this (attached) but that would contain your rsID of course. That way I would be able to filter my LD set accordingly.

Thank you,

snpMap.zip

lcolladotor commented 4 years ago

Hi,

The BrainSeq_Phase2_RiboZero_Genotypes_n551.rda file is part of the brainseq_phase2_genotypes Globus collection we have at http://research.libd.org/globus/. You have to request access to the genotype data first, then send us your Globus email so we can give you access to the collection, after which you can then download the Rda file you are looking for.

Best, Leonardo

lcolladotor commented 4 years ago

Here's a preview of the objects inside that .rda file:

> load("BrainSeq_Phase2_RiboZero_Genotypes_n551.rda", verbose = TRUE)
Loading objects:
  mds
  snp
  snpMap
> head(snpMap)
                       CHR                    SNP CM    POS COUNTED ALT Type
rs9988021:866319:G:A     1   rs9988021:866319:G:A  0 866319       G   A  SNV
rs111819742:868861:C:T   1 rs111819742:868861:C:T  0 868861       T   C  SNV
GA018352                 1               GA018352  0 879687       T   C  SNV
rs3748592                1              rs3748592  0 880238       A   G  SNV
rs2340582                1              rs2340582  0 882803       A   G  SNV
rs4246503                1              rs4246503  0 884815       A   G  SNV
                       newRef newCount        name  rsNumGuess chr_hg38
rs9988021:866319:G:A        A        G   rs9988021   rs9988021     chr1
rs111819742:868861:C:T      C        T rs111819742 rs111819742     chr1
GA018352                    C        T      rs2839      rs2839     chr1
rs3748592                   G        A   rs3748592   rs3748592     chr1
rs2340582                   G        A   rs2340582   rs2340582     chr1
rs4246503                   G        A   rs4246503   rs4246503     chr1
                       pos_hg38
rs9988021:866319:G:A     930939
rs111819742:868861:C:T   933481
GA018352                 944307
rs3748592                944858
rs2340582                947423
rs4246503                949435
> dim(snpMap)
[1] 7023860      13
> dim(snp)
[1] 7023860     551
> snp[1:5, 1:5]
                       Br5168 Br5073 Br5217 Br5234 Br5372
rs9988021:866319:G:A        0      0      0      0      1
rs111819742:868861:C:T      0      0      0      0      0
GA018352                    1      0      0      0      1
rs3748592                   0      0      0      0      1
rs2340582                   0      0      0      0      1

SalimMegat commented 4 years ago

Hi Leonardo,

Thank you for your reply ! I am totally aware that accessing the raw genotype data need to be requested through dbGaP as they are sensitive. However, I actually do not need to get access to the individual-level genotype data. I would only need the first part of the file that contain the list of the variants with position on hg19 and hg38. With this , I would be able to filter and update my LD set to hg38 positions and use the weights that you kindly provided. Do you think that it would be possible for you to share the first part with the mapping position only ?

I appreciate your help !

Best,

Salim.

Le 21 juil. 2020 à 17:27, Leonardo Collado-Torres notifications@github.com a écrit :

Here's a preview of the objects inside that .rda file:

load("BrainSeq_Phase2_RiboZero_Genotypes_n551.rda", verbose = TRUE) Loading objects: mds snp snpMap head(snpMap) CHR SNP CM POS COUNTED ALT Type rs9988021:866319:G:A 1 rs9988021:866319:G:A 0 866319 G A SNV rs111819742:868861:C:T 1 rs111819742:868861:C:T 0 868861 T C SNV GA018352 1 GA018352 0 879687 T C SNV rs3748592 1 rs3748592 0 880238 A G SNV rs2340582 1 rs2340582 0 882803 A G SNV rs4246503 1 rs4246503 0 884815 A G SNV newRef newCount name rsNumGuess chr_hg38 rs9988021:866319:G:A A G rs9988021 rs9988021 chr1 rs111819742:868861:C:T C T rs111819742 rs111819742 chr1 GA018352 C T rs2839 rs2839 chr1 rs3748592 G A rs3748592 rs3748592 chr1 rs2340582 G A rs2340582 rs2340582 chr1 rs4246503 G A rs4246503 rs4246503 chr1 pos_hg38 rs9988021:866319:G:A 930939 rs111819742:868861:C:T 933481 GA018352 944307 rs3748592 944858 rs2340582 947423 rs4246503 949435 dim(snpMap) [1] 7023860 13 dim(snp) [1] 7023860 551 snp[1:5, 1:5] Br5168 Br5073 Br5217 Br5234 Br5372 rs9988021:866319:G:A 0 0 0 0 1 rs111819742:868861:C:T 0 0 0 0 0 GA018352 1 0 0 0 1 rs3748592 0 0 0 0 1 rs2340582 0 0 0 0 1 — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/LieberInstitute/brainseq_phase2/issues/33#issuecomment-661929722, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKO24HU4MKIFMJQKQM332E3R4WXVHANCNFSM4O57SESQ.

lcolladotor commented 4 years ago

Hi,

We are in the process of uploading a text file with this data to AWS. You should be able to read it into R using read.table() without any arguments.

## Test run
> write.table(head(snpMap), file = "BrainSeq_Phase2_RiboZero_Genotypes_n551_snpMap.txt", quote = FALSE)
> read.table("BrainSeq_Phase2_RiboZero_Genotypes_n551_snpMap.txt")
                       CHR                    SNP CM    POS COUNTED ALT Type
rs9988021:866319:G:A     1   rs9988021:866319:G:A  0 866319       G   A  SNV
rs111819742:868861:C:T   1 rs111819742:868861:C:T  0 868861       T   C  SNV
GA018352                 1               GA018352  0 879687       T   C  SNV
rs3748592                1              rs3748592  0 880238       A   G  SNV
rs2340582                1              rs2340582  0 882803       A   G  SNV
rs4246503                1              rs4246503  0 884815       A   G  SNV
                       newRef newCount        name  rsNumGuess chr_hg38
rs9988021:866319:G:A        A        G   rs9988021   rs9988021     chr1
rs111819742:868861:C:T      C        T rs111819742 rs111819742     chr1
GA018352                    C        T      rs2839      rs2839     chr1
rs3748592                   G        A   rs3748592   rs3748592     chr1
rs2340582                   G        A   rs2340582   rs2340582     chr1
rs4246503                   G        A   rs4246503   rs4246503     chr1

## Creating th emain file
> Sys.time(); write.table(snpMap, file = "BrainSeq_Phase2_RiboZero_Genotypes_n551_snpMap.txt", quote = FALSE); Sys.time()
[1] "2020-07-21 13:08:49 EDT"
[1] "2020-07-21 13:09:49 EDT"
> system("wc -l *snpMap.txt")
7023861 BrainSeq_Phase2_RiboZero_Genotypes_n551_snpMap.txt

## Compress to save space
$ gzip BrainSeq_Phase2_RiboZero_Genotypes_n551_snpMap.txt
$ ls -lh
-rwxrwx--- 1 lcollado lieber_jaffe 165M Jul 21 13:09 BrainSeq_Phase2_RiboZero_Genotypes_n551_snpMap.txt.gz

Best, Leo

andrewejaffe commented 4 years ago

Here it is: https://libd-brainseq2.s3.us-east-2.amazonaws.com/BrainSeqPhaseII_snp_annotation.txt.gz

SalimMegat commented 4 years ago

Hi Leonardo,

Many thanks for your help !!

Best,

Salim.

Le ven. 17 juil. 2020 13:47, Andrew Jaffe notifications@github.com a écrit :

https://github.com/LieberInstitute/brainseq_phase2/tree/master/twas/reference_hg38 Is that what you are looking for?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/LieberInstitute/brainseq_phase2/issues/33#issuecomment-660063825, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKO24HRSV2UM4GAWV7TKS3LR4A25BANCNFSM4O57SESQ .

LieberInstitute / brainseq_phase2

LD set in hg38 ? #33