Closed Uljibuh closed 3 years ago
We have the keyword argument f_person
in the function SnpArrays.filter
for exactly this purpose. Assuming cowdata
is the variable for SnpData
, you can do
SnpArrays.filter(cowdata; des="cowdata_in_A", f_person = x -> parse(Int, x[:iid]) in A.iid)
for selecting iids in the DataFrame A, and
SnpArrays.filter(cowdata; des="cowdata_not_in_A", f_person = x -> !(parse(Int, x[:iid]) in A.iid))
for selecting iids not in A.
parse
is used because A
is in Int64, while we read iid, fid in fam files as a String.
Thank you, it worked :)
Hi, I am working with plink file using SnpArrays.jl package. here is what my plink file and dataframe (A) looks like
`SnpData(people: 28960, snps: 45807, snp_info: │ Row │ chromosome │ snpid │ genetic_distance │ position │ allele1 │ allele2 │ │ │ String │ String │ Float64 │ Int64 │ String │ String │ ├─────┼────────────┼────────────────────────┼──────────────────┼──────────┼─────────┼─────────┤ │ 1 │ 1 │ BovineHD0100000015 │ 0.0 │ 36337 │ G │ A │ │ 2 │ 1 │ Hapmap43437-BTA-101873 │ 0.0 │ 135098 │ G │ A │ │ 3 │ 1 │ BovineHD0100000062 │ 0.0 │ 206470 │ C │ T │ │ 4 │ 1 │ ARS-BFGL-NGS-16466 │ 0.0 │ 267940 │ T │ C │ │ 5 │ 1 │ BTA-34880 │ 0.0 │ 347418 │ T │ C │ │ 6 │ 1 │ BovineHD0100000096 │ 0.0 │ 348331 │ C │ A │ …,
person_info: │ Row │ fid │ iid │ father │ mother │ sex │ phenotype │ │ │ Abstrac… │ Abstract… │ Abstract… │ Abstract… │ Abstrac… │ Abstract… │ ├─────┼──────────┼───────────┼───────────┼───────────┼──────────┼───────────┤ │ 1 │ 0 │ 409859435 │ 400005850 │ 411102034 │ 2 │ -9 │ │ 2 │ 0 │ 409922125 │ 400005850 │ 411657369 │ 2 │ -9 │ │ 3 │ 0 │ 411075330 │ 400005356 │ 407723032 │ 2 │ -9 │ │ 4 │ 0 │ 412057132 │ 400005972 │ 410308103 │ 2 │ -9 │ │ 5 │ 0 │ 404693736 │ 400003797 │ 404050484 │ 2 │ -9 │ │ 6 │ 0 │ 404880845 │ 400004013 │ 403616839 │ 2 │ -9 │ …, srcbed: C:\Users\wubu.julia\packages\SnpArrays\CL3iQ\src…\data\genotype99.bed srcbim: C:\Users\wubu.julia\packages\SnpArrays\CL3iQ\src…\data\genotype99.bim srcfam: C:\Users\wubu.julia\packages\SnpArrays\CL3iQ\src…\data\genotype99.fam )`
dataframe (A) `
` now I want to exclude the individuals in (A ) from the plink files (cowdata) based on their ids and save it as a subset bed file. how to do it ? thanks