Closed snowformatics closed 1 year ago
Yes,
FaST-LMM is just as happy to run on a "SnpData" as on a "Bed".
Here is a simple example of creating a tiny SnpData object:
from pysnptools.snpreader import SnpData
snpdata = SnpData(iid=[['fam0','iid0'],['fam0','iid1']], sid=['snp334','snp349','snp921'], val=[[0.,2.,0.],[0.,1.,2.]]) print((snpdata.val[0,1], snpdata.iid_count, snpdata.sid_count)) (2.0, 2, 3)
You'll notice that you pass "SnpData(...)" a list of family/individual id names, a list of SNP names, and a 2D array of values. In this case, the SnpData was created without chromosome information, so it defaults to chromosome "0". You can change this with an additional line:
snpdata.pos[:,0] = [1,1,2] # assign the first and 2nd SNP to chrom 1 and 3rd to chrom 2
There is more info at: https://fastlmm.github.io/PySnpTools/#snpreader-snpdata
Put another way - If you can write Python code that gets this information from VCF files (or any other format):
If you can extract this info from your VCF files (or any other sources), I'm happy to help you get FaST-LMM running on this info.
From: snowformatics @.> Sent: Monday, August 01, 2022 12:37 AM To: fastlmm/FaST-LMM @.> Cc: Subscribed @.***> Subject: [fastlmm/FaST-LMM] Run without PLINK format (Issue #30) Importance: High
Hi,
I have a short question, is it possible to run FaST-LMM without PLINK Bed format? If I understand correctly, single_snp function run with any SnpReader format which could be also SnpData. Is this correct? What about the FAM and BIM file, are there required? I want to add an alternative option without using PLINK and work with the VCF directly or often users have only the SNP matrix available.
Thanks Stefanie
- Reply to this email directly, view it on GitHubhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ffastlmm%2FFaST-LMM%2Fissues%2F30&data=05%7C01%7C%7C4dff3306eade45b84cc408da7390a5fb%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637949362340410043%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=S%2BeFMvpfIwPZOQmozFeY3iPLChEKCF9g%2B3kVIxvTWyg%3D&reserved=0, or unsubscribehttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABR65P6AWOQXYZJGKX36FCTVW55CRANCNFSM55GQUGOQ&data=05%7C01%7C%7C4dff3306eade45b84cc408da7390a5fb%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637949362340410043%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=EQUaEvEhM5cUtIwANkogDyvhk2RWlcP9RnJSwwglFeQ%3D&reserved=0. You are receiving this because you are subscribed to this thread.Message ID: @.**@.>>
Hi Carl,
Thanks, good news! I will extract the necessary data and try it out. I will inform you then.
Stefanie
Am Mo., 1. Aug. 2022 um 19:53 Uhr schrieb Carl Kadie < @.***>:
Yes,
FaST-LMM is just as happy to run on a "SnpData" as on a "Bed".
Here is a simple example of creating a tiny SnpData object:
from pysnptools.snpreader import SnpData
snpdata = SnpData(iid=[['fam0','iid0'],['fam0','iid1']], sid=['snp334','snp349','snp921'], val=[[0.,2.,0.],[0.,1.,2.]]) print((snpdata.val[0,1], snpdata.iid_count, snpdata.sid_count)) (2.0, 2, 3)
You'll notice that you pass "SnpData(...)" a list of family/individual id names, a list of SNP names, and a 2D array of values. In this case, the SnpData was created without chromosome information, so it defaults to chromosome "0". You can change this with an additional line:
snpdata.pos[:,0] = [1,1,2] # assign the first and 2nd SNP to chrom 1 and 3rd to chrom 2
There is more info at: https://fastlmm.github.io/PySnpTools/#snpreader-snpdata
Put another way - If you can write Python code that gets this information from VCF files (or any other format):
- Family and SNP ids
- SNP ids
- Chromosome number
- A 2-D array of values 0,1,2 and missing (representing the allele count) Then you can create a SnpData and then you can call FaST-LMM.
If you can extract this info from your VCF files (or any other sources), I'm happy to help you get FaST-LMM running on this info.
- Carl
From: snowformatics @.> Sent: Monday, August 01, 2022 12:37 AM To: fastlmm/FaST-LMM @.> Cc: Subscribed @.***> Subject: [fastlmm/FaST-LMM] Run without PLINK format (Issue #30) Importance: High
Hi,
I have a short question, is it possible to run FaST-LMM without PLINK Bed format? If I understand correctly, single_snp function run with any SnpReader format which could be also SnpData. Is this correct? What about the FAM and BIM file, are there required? I want to add an alternative option without using PLINK and work with the VCF directly or often users have only the SNP matrix available.
Thanks Stefanie
- Reply to this email directly, view it on GitHub< https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ffastlmm%2FFaST-LMM%2Fissues%2F30&data=05%7C01%7C%7C4dff3306eade45b84cc408da7390a5fb%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637949362340410043%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=S%2BeFMvpfIwPZOQmozFeY3iPLChEKCF9g%2B3kVIxvTWyg%3D&reserved=0>, or unsubscribe< https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABR65P6AWOQXYZJGKX36FCTVW55CRANCNFSM55GQUGOQ&data=05%7C01%7C%7C4dff3306eade45b84cc408da7390a5fb%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637949362340410043%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=EQUaEvEhM5cUtIwANkogDyvhk2RWlcP9RnJSwwglFeQ%3D&reserved=0
. You are receiving this because you are subscribed to this thread.Message ID: @.**@.>>
— Reply to this email directly, view it on GitHub https://github.com/fastlmm/FaST-LMM/issues/30#issuecomment-1201524623, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHTXXX7GWEPXMRMHVRKWYTVXAFJTANCNFSM55GQUGOQ . You are receiving this because you authored the thread.Message ID: @.***>
Hi,
I have a short question, is it possible to run FaST-LMM without PLINK Bed format? If I understand correctly, single_snp function run with any SnpReader format which could be also SnpData. Is this correct? What about the FAM and BIM file, are there required? I want to add an alternative option without using PLINK and work with the VCF directly or often users have only the SNP matrix available.
Thanks Stefanie