fastlmm / FaST-LMM

Python version of Factored Spectrally Transformed Linear Mixed Models
https://fastlmm.github.io/
Apache License 2.0
47 stars 11 forks source link

Run without PLINK format #30

Closed snowformatics closed 1 year ago

snowformatics commented 2 years ago

Hi,

I have a short question, is it possible to run FaST-LMM without PLINK Bed format? If I understand correctly, single_snp function run with any SnpReader format which could be also SnpData. Is this correct? What about the FAM and BIM file, are there required? I want to add an alternative option without using PLINK and work with the VCF directly or often users have only the SNP matrix available.

Thanks Stefanie

CarlKCarlK commented 2 years ago

Yes,

FaST-LMM is just as happy to run on a "SnpData" as on a "Bed".

Here is a simple example of creating a tiny SnpData object:

from pysnptools.snpreader import SnpData

snpdata = SnpData(iid=[['fam0','iid0'],['fam0','iid1']], sid=['snp334','snp349','snp921'], val=[[0.,2.,0.],[0.,1.,2.]]) print((snpdata.val[0,1], snpdata.iid_count, snpdata.sid_count)) (2.0, 2, 3)

You'll notice that you pass "SnpData(...)" a list of family/individual id names, a list of SNP names, and a 2D array of values. In this case, the SnpData was created without chromosome information, so it defaults to chromosome "0". You can change this with an additional line:

snpdata.pos[:,0] = [1,1,2] # assign the first and 2nd SNP to chrom 1 and 3rd to chrom 2

There is more info at: https://fastlmm.github.io/PySnpTools/#snpreader-snpdata

Put another way - If you can write Python code that gets this information from VCF files (or any other format):

If you can extract this info from your VCF files (or any other sources), I'm happy to help you get FaST-LMM running on this info.

From: snowformatics @.> Sent: Monday, August 01, 2022 12:37 AM To: fastlmm/FaST-LMM @.> Cc: Subscribed @.***> Subject: [fastlmm/FaST-LMM] Run without PLINK format (Issue #30) Importance: High

Hi,

I have a short question, is it possible to run FaST-LMM without PLINK Bed format? If I understand correctly, single_snp function run with any SnpReader format which could be also SnpData. Is this correct? What about the FAM and BIM file, are there required? I want to add an alternative option without using PLINK and work with the VCF directly or often users have only the SNP matrix available.

Thanks Stefanie

- Reply to this email directly, view it on GitHubhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ffastlmm%2FFaST-LMM%2Fissues%2F30&data=05%7C01%7C%7C4dff3306eade45b84cc408da7390a5fb%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637949362340410043%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=S%2BeFMvpfIwPZOQmozFeY3iPLChEKCF9g%2B3kVIxvTWyg%3D&reserved=0, or unsubscribehttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABR65P6AWOQXYZJGKX36FCTVW55CRANCNFSM55GQUGOQ&data=05%7C01%7C%7C4dff3306eade45b84cc408da7390a5fb%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637949362340410043%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=EQUaEvEhM5cUtIwANkogDyvhk2RWlcP9RnJSwwglFeQ%3D&reserved=0. You are receiving this because you are subscribed to this thread.Message ID: @.**@.>>

snowformatics commented 2 years ago

Hi Carl,

Thanks, good news! I will extract the necessary data and try it out. I will inform you then.

Stefanie

Am Mo., 1. Aug. 2022 um 19:53 Uhr schrieb Carl Kadie < @.***>:

Yes,

FaST-LMM is just as happy to run on a "SnpData" as on a "Bed".

Here is a simple example of creating a tiny SnpData object:

from pysnptools.snpreader import SnpData

snpdata = SnpData(iid=[['fam0','iid0'],['fam0','iid1']], sid=['snp334','snp349','snp921'], val=[[0.,2.,0.],[0.,1.,2.]]) print((snpdata.val[0,1], snpdata.iid_count, snpdata.sid_count)) (2.0, 2, 3)

You'll notice that you pass "SnpData(...)" a list of family/individual id names, a list of SNP names, and a 2D array of values. In this case, the SnpData was created without chromosome information, so it defaults to chromosome "0". You can change this with an additional line:

snpdata.pos[:,0] = [1,1,2] # assign the first and 2nd SNP to chrom 1 and 3rd to chrom 2

There is more info at: https://fastlmm.github.io/PySnpTools/#snpreader-snpdata

Put another way - If you can write Python code that gets this information from VCF files (or any other format):

  • Family and SNP ids
  • SNP ids
  • Chromosome number
  • A 2-D array of values 0,1,2 and missing (representing the allele count) Then you can create a SnpData and then you can call FaST-LMM.

If you can extract this info from your VCF files (or any other sources), I'm happy to help you get FaST-LMM running on this info.

  • Carl

From: snowformatics @.> Sent: Monday, August 01, 2022 12:37 AM To: fastlmm/FaST-LMM @.> Cc: Subscribed @.***> Subject: [fastlmm/FaST-LMM] Run without PLINK format (Issue #30) Importance: High

Hi,

I have a short question, is it possible to run FaST-LMM without PLINK Bed format? If I understand correctly, single_snp function run with any SnpReader format which could be also SnpData. Is this correct? What about the FAM and BIM file, are there required? I want to add an alternative option without using PLINK and work with the VCF directly or often users have only the SNP matrix available.

Thanks Stefanie

- Reply to this email directly, view it on GitHub< https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ffastlmm%2FFaST-LMM%2Fissues%2F30&data=05%7C01%7C%7C4dff3306eade45b84cc408da7390a5fb%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637949362340410043%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=S%2BeFMvpfIwPZOQmozFeY3iPLChEKCF9g%2B3kVIxvTWyg%3D&reserved=0>, or unsubscribe< https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABR65P6AWOQXYZJGKX36FCTVW55CRANCNFSM55GQUGOQ&data=05%7C01%7C%7C4dff3306eade45b84cc408da7390a5fb%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637949362340410043%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=EQUaEvEhM5cUtIwANkogDyvhk2RWlcP9RnJSwwglFeQ%3D&reserved=0

. You are receiving this because you are subscribed to this thread.Message ID: @.**@.>>

— Reply to this email directly, view it on GitHub https://github.com/fastlmm/FaST-LMM/issues/30#issuecomment-1201524623, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHTXXX7GWEPXMRMHVRKWYTVXAFJTANCNFSM55GQUGOQ . You are receiving this because you authored the thread.Message ID: @.***>