fastlmm / FaST-LMM

Python version of Factored Spectrally Transformed Linear Mixed Models
https://fastlmm.github.io/
Apache License 2.0
47 stars 11 forks source link

Kinship #49

Closed snowformatics closed 5 months ago

snowformatics commented 5 months ago

Hi,

I it possible to extract the kinship matrix calculated by fast-lmm?

Thanks Stefanie

CarlKCarlK commented 5 months ago

Yes, I can write a little script that computes it the same way that FaST-LMM does.

It actually computes one kinship matrix per leave-out-out-chromosome. Is that what you need or just one that uses all the chromosomes?

From: snowformatics @.> Sent: Monday, June 17, 2024 2:13 AM To: fastlmm/FaST-LMM @.> Cc: Subscribed @.***> Subject: [fastlmm/FaST-LMM] Kinship (Issue #49) Importance: High

Hi,

I it possible to extract the kinship matrix calculated by fast-lmm?

Thanks Stefanie

- Reply to this email directly, view it on GitHubhttps://github.com/fastlmm/FaST-LMM/issues/49, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABR65P74IEBZETKMQQYVKCLZH2SCBAVCNFSM6AAAAABJNTBVKWVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM2TMOBSGYYTGMI. You are receiving this because you are subscribed to this thread.Message ID: @.**@.>>

snowformatics commented 5 months ago

Hi Carl,

I want to plot the matrix to represent the genetic relatedness of the individuals so I would need the data from all chromosomes.

Thanks Stefanie

Carl Kadie @.***> schrieb am Mo. 17. Juni 2024 um 17:27:

Yes, I can write a little script that computes it the same way that FaST-LMM does.

It actually computes one kinship matrix per leave-out-out-chromosome. Is that what you need or just one that uses all the chromosomes?

  • Carl

From: snowformatics @.> Sent: Monday, June 17, 2024 2:13 AM To: fastlmm/FaST-LMM @.> Cc: Subscribed @.***> Subject: [fastlmm/FaST-LMM] Kinship (Issue #49) Importance: High

Hi,

I it possible to extract the kinship matrix calculated by fast-lmm?

Thanks Stefanie

- Reply to this email directly, view it on GitHub< https://github.com/fastlmm/FaST-LMM/issues/49>, or unsubscribe< https://github.com/notifications/unsubscribe-auth/ABR65P74IEBZETKMQQYVKCLZH2SCBAVCNFSM6AAAAABJNTBVKWVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM2TMOBSGYYTGMI

. You are receiving this because you are subscribed to this thread.Message ID: @.**@.>>

— Reply to this email directly, view it on GitHub https://github.com/fastlmm/FaST-LMM/issues/49#issuecomment-2173713005, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHTXXV5ANPUD3OL37W4NE3ZH356BAVCNFSM6AAAAABJNTBVKWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZTG4YTGMBQGU . You are receiving this because you authored the thread.Message ID: @.***>

CarlKCarlK commented 5 months ago

Here you go:

from fastlmm.util import example_file # Download and return local file name
from pysnptools.snpreader import Bed, Pheno
import pysnptools.util as pstutil
from pysnptools.kernelreader import SnpKernel
from pysnptools.standardizer import Unit

bed_fn = example_file('tests/datasets/synth/all.*','*.bed')
pheno_fn = example_file("tests/datasets/synth/pheno_10_causals.txt")
cov_fn = example_file("tests/datasets/synth/cov.txt")

bed = Bed(bed_fn, count_A1=True).read()
# print dimensions of bed
print(f"Bed dimensions: {bed.iid_count} x {bed.sid_count}")
pheno = Pheno(pheno_fn).read()

# replace original bed with one that has the same iids as the pheno
# If no values are missing, this will not do anything
bed, pheno = pstutil.intersect_apply([bed, pheno])

print(f"Bed dimensions: {bed.iid_count} x {bed.sid_count}")
# sample bed values
print(f"sample bed values: \n{bed.val[0:5,0:5]}")

# standardize the bed data
# that is: Make it mean 0 and sd 1 with any missing values filled in with 0
bed_standardized = bed.standardize()
print(f"sample bed values: \n{bed.val[0:5,0:5]}")

# create kinship matrix
kernel = SnpKernel(bed_standardized, standardizer=Unit())
kernel = kernel.read().standardize()  # defaults to DiagKtoN standardize
print(f"sample kernel values: \n{kernel.val[0:5,0:5]}")

# Behind the scenes this is the same as:
# kernel2 = snps * snps.T / snps.shape[1]
kernel2 = bed_standardized.val @ bed_standardized.val.T / bed_standardized.sid_count
print(f"sample kernel2 values: \n{kernel2[0:5,0:5]}")

# plot the kinship matrix
import pylab
pylab.figure(figsize=(10,8))

pylab.suptitle("kinship matrix")
pylab.imshow(kernel.val, cmap=pylab.gray())
pylab.show()

As you can see, if you know if have no missing data, etc., it can be simplifed.

Thanks for using FaST-LMM and PySnpTools! -- Carl

snowformatics commented 5 months ago

Thanks a lot Carl! It worked very well with no missing values, but when I have missing values and I use: bed, pheno = pstutil.intersect_apply([bed, pheno])

I get this error: AttributeError: '_SnpSubset' object has no attribute 'val'

and further: bed_standardized = bed.standardize() AttributeError: '_SnpSubset' object has no attribute 'standardize'

CarlKCarlK commented 5 months ago

Double check that your pheno line includes .read():

pheno = Pheno(pheno_fn).read()

Let me know if that's it.

snowformatics commented 5 months ago

Hi Carl,

yes, this line is there.

So the problem is, if there are no missing samples:

bed = Bed(bed_fn, count_A1=True).read()
pheno = Pheno(pheno_fn).read()
print (bed.shape, pheno.shape, type(bed), type(pheno))
bed, pheno = pstutil.intersect_apply([bed, pheno])
print (bed.shape, pheno.shape, type(bed), type(pheno))

Will return class 'pysnptools.snpreader.snpdata.SnpData'

58, 385649) (58, 1) <class 'pysnptools.snpreader.snpdata.SnpData'> <class 'pysnptools.snpreader.snpdata.SnpData'> (58, 385649) (58, 1) <class 'pysnptools.snpreader.snpdata.SnpData'> <class 'pysnptools.snpreader.snpdata.SnpData'>

If samples are missing:

(200, 949174) (196, 1) <class 'pysnptools.snpreader.snpdata.SnpData'> <class 'pysnptools.snpreader.snpdata.SnpData'> (196, 949174) (196, 1) <class 'pysnptools.snpreader._subset._SnpSubset'> <class 'pysnptools.snpreader._subset._SnpSubset'>

It's returns a pysnptools.snpreader._subset._SnpSubset object.

Which somehow is not compatible with:

bed_standardized = bed.standardize() AttributeError: '_SnpSubset' object has no attribute 'standardize'

I am on Windows 10, Python 3.9 with fast-lmm 0.6.7

Thanks

CarlKCarlK commented 5 months ago

Excellent debugging! Ok, I think I've got it. Please add a line after the intersect:

bed, pheno = pstutil.intersect_apply([bed, pheno])
bed = bed.read()
snowformatics commented 5 months ago

Ah I see! Thanks a lot Carl, your support is super fantastic 👍

Everything works now. Best regards Stefanie

CarlKCarlK commented 5 months ago

You're welcome. Always fun to show off PySnpTools.