OpenMendel / TrajGWAS.jl

MIT License
13 stars 1 forks source link

How to use Keyword arguments covrowinds and geneticrowinds when using a VCF file? #37

Open zjy71 opened 1 year ago

zjy71 commented 1 year ago

Hello, I want to know how to use Keyword arguments covrowinds and geneticrowinds when using a VCF file? Only some of samples in my VCF file are with an interested phenotype. Here is my script and errors.

vcfdir = "/mnt/OS5300/data/public_data/impute/"
chr = 1
fitted_null = "PLT.null.txt"
pvaldir = "/share/home/output/"
vcf_dir = "/share/home/bin/vcf_id.txt"

vcffile = vcfdir * "glimpse_merged.chr$(chr)"
pvalfile = pvaldir * "/P.chr$(chr).txt"
covfile = "/share/home/trajgwas_lg_coding.csv"

pheno_data = CSV.read(covfile, DataFrame)
vcf = CSV.read(vcf_dir, DataFrame, header = false)[!,1]

covrowmask, geneticrowmask = matchindices(@formula(H ~ 1 + age + BMI + day),
        @formula(y ~ 1),
        @formula(Hb ~ 1 + age + BMI + day),
        :id,
        pheno_data, vcf);

trajgwas(@formula(H ~ 1 + age + BMI + day),
        @formula(Hb ~ 1),
        @formula(Hb ~ 1 + age + BMI + day),
        :id,
        pheno_data,
        vcffile;
        geneticformat = "VCF",
        vcftype = :DS,
        pvalfile,
        nullfile = fitted_null,
        covrowinds = covrowmask,
        geneticrowinds = geneticrowmask)`

`ERROR: LoadError: AssertionError: Null model contains IDs not in the /mnt/OS5300/data/public_data/impute/glimpse_merged.chr1 file applying `geneticrowinds` mask.
Stacktrace:
 [1] trajgwas(fittednullmodel::WSVarLmmModel{Float64}, geneticfile::String; analysistype::String, geneticformat::String, vcftype::Symbol, samplepath::Nothing, testformula::FormulaTerm{Term, Term}, test::Symbol, init::Function, pvalfile::String, snpmodel::Val{1}, snpinds::Nothing, usespa::Bool, reportchisq::Bool, geneticrowinds::Vector{Int64}, solver::Ipopt.Optimizer, solver_config::Dict{String, Any}, parallel::Bool, runs::Int64, verbose::Bool, snpset::Nothing, e::Nothing, kwargs::Base.Pairs{Symbol, UnitRange{Int64}, Tuple{Symbol}, NamedTuple{(:covrowinds,), Tuple{UnitRange{Int64}}}})
   @ TrajGWAS ~/.julia/packages/TrajGWAS/uUosN/src/gwas.jl:273
 [2] trajgwas(nullmeanformula::FormulaTerm{Term, Tuple{ConstantTerm{Int64}, Term, Term, Term, Term, Term, Term, Term, Term, Term, Term, Term, Term, Term}}, reformula::FormulaTerm{Term, ConstantTerm{Int64}}, nullwsvarformula::FormulaTerm{Term, Tuple{ConstantTerm{Int64}, Term, Term, Term}}, idvar::Symbol, nulldf::DataFrame, geneticfile::String; nullfile::String, solver::Ipopt.Optimizer, solver_config::Dict{String, Any}, parallel::Bool, runs::Int64, verbose::Bool, init::typeof(init_ls!), kwargs::Base.Pairs{Symbol, Any, NTuple{5, Symbol}, NamedTuple{(:geneticformat, :vcftype, :pvalfile, :covrowinds, :geneticrowinds), Tuple{String, Symbol, String, UnitRange{Int64}, Vector{Int64}}}})
   @ TrajGWAS ~/.julia/packages/TrajGWAS/uUosN/src/gwas.jl:177
 [3] top-level scope
   @ ~/bin/model.jl:28
in expression starting at /share/home/bin/model.jl:28

Looking forward to your reply.

kose-y commented 1 year ago

In general, the lefthand side of the ~ of the formulae should not differ.

kose-y commented 1 year ago

Please let us know if it works after matching the lefthand sides. We will add a check to make sure that they are identical in the future if that resolves the problem.

zjy71 commented 1 year ago

I am sorry for my minor mistake. But after I changed lefthand side of the ~ of the formula, I got these errors.

ERROR: LoadError: MethodError: no method matching trajgwas(::FormulaTerm{Term, Tuple{ConstantTerm{Int64}, Term, Term, Term, Term, Term, Term, Term, Term, Term, Term, Term, Term, Term}}, ::FormulaTerm{Term, ConstantTerm{Int64}}, ::FormulaTerm{Term, Tuple{ConstantTerm{Int64}, Term, Term, Term}}, ::Symbol, ::DataFrame, ::String, ::String; geneticformat="VCF", vcftype=:DS, nullfile="/share/home/trajgwas/PLT.null.txt", covrowinds=Bool[1, 1, 1, 1, 1, 1, 1, 1, 1, 1 … 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], geneticrowinds=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10 … 20084, 20085, 20086, 20087, 20088, 20089, 20090, 20091, 20092, 20093]) Closest candidates are: trajgwas(::FormulaTerm, ::FormulaTerm, ::FormulaTerm, ::Union{String, Symbol}, ::DataFrame, ::Union{Nothing, AbstractString}; nullfile, solver, solver_config, parallel, runs, verbose, init, kwargs...) at ~/.julia/packages/TrajGWAS/uUosN/src/gwas.jl:149 trajgwas(::FormulaTerm, ::FormulaTerm, ::FormulaTerm, ::Union{String, Symbol}, !Matched::AbstractString, ::Union{Nothing, AbstractString}; covtype, covrowinds, kwargs...) at ~/.julia/packages/TrajGWAS/uUosN/src/gwas.jl:128 trajgwas(::FormulaTerm, ::FormulaTerm, ::FormulaTerm, ::Union{String, Symbol}, ::DataFrame) at ~/.julia/packages/TrajGWAS/uUosN/src/gwas.jl:149 got unsupported keyword arguments "geneticformat", "vcftype", "nullfile", "covrowinds", "geneticrowinds" ... Stacktrace: [1] top-level scope @ ~/trajgwas_xx/longgang_20230902/bin/1.null_model.jl:37 in expression starting at /share/home/bin/model.jl:37

kose-y commented 1 year ago

Could you share what's in

~/trajgwas_xx/longgang_20230902/bin/1.null_model.jl:37
in expression starting at /share/home/bin/model.jl:37

?

You seem to have an extra String argument to the trajgwas function before the semicolon.

zjy71 commented 1 year ago

line 37 is trajgwas(@formula(H ~ 1 + age + BMI + day), @formula(Hb ~ 1), @formula(Hb ~ 1 + age + BMI + day), :id, pheno_data, vcffile; geneticformat = "VCF", vcftype = :DS, pvalfile, nullfile = fitted_null, covrowinds = covrowmask, geneticrowinds = geneticrowmask) any argument is extra?

kose-y commented 1 year ago

Is what you gave still the content there? The lefthand sides of the formulae still do not match yet, and

ERROR: LoadError: MethodError: no method matching
trajgwas(::FormulaTerm{Term, Tuple{ConstantTerm{Int64}, Term, Term, Term, Term, Term, Term, Term, Term, Term, Term, Term, Term, Term}}, ::FormulaTerm{Term, ConstantTerm{Int64}}, ::FormulaTerm{Term, Tuple{ConstantTerm{Int64}, Term, Term, Term}}, ::Symbol, ::DataFrame, ::String, ::String; geneticformat="VCF", vcftype=:DS, nullfile="/share/home/trajgwas/PLT.null.txt", covrowinds=Bool[1, 1, 1, 1, 1, 1, 1, 1, 1, 1 … 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], geneticrowinds=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10 … 20084, 20085, 20086, 20087, 20088, 20089, 20090, 20091, 20092, 20093])

indicates that the erroring trajgwas function call has seven arguments before the semicolon, but the code you gave has six.

Please check the current content of Line 37 of file ~/trajgwas_xx/longgang_20230902/bin/1.null_model.jl and Line 37 of file /share/home/bin/model.jl.

zjy71 commented 1 year ago

Thank you for your prompt and patient response. I checked the script and reran it. But I still got some errors. Here are my script and errors.

using ArgParse
using DataFrames, CSV
using Statistics
using TrajGWAS
using WiSER
using LinearAlgebra
BLAS.set_num_threads(1)

vcfdir = "/mnt/lsy/OS5300/data/public_data/impute/"
chr = 21
fitted_null = "/share/home/lsy_yangzijing/trajgwas_xx/longgang_20230902/input/PLT.null.txt"
pvaldir = "/share/home/lsy_yangzijing/trajgwas_xx/longgang_20230902/output/score"
vcf_index_dir = "/share/home/lsy_yangzijing/trajgwas_xx/longgang_20230902/bin/vcf_id.txt"

vcffile = vcfdir * "longgang_NIPT_glimpse_50948.merged.chr$(chr)"
pvalfile = pvaldir * "/PLT.chr$(chr).txt"
covfile = "/share/home/lsy_guoxinxin/project/anemia/trajGWAS/longgang_input/trajgwas_lg_coding.csv"

pheno_data = CSV.read(covfile, DataFrame, types=[String31, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64])
vcf_index = CSV.read(vcf_index_dir, DataFrame, header = false)[!,1]

covrowmask, geneticrowmask = matchindices(@formula(Hb ~ 1 + age + BMI),
        @formula(Hb ~ 1),
        @formula(Hb ~ 1 + age + BMI),
        :nipt_id,
        pheno_data, vcf_index);

trajgwas(@formula(Hb ~ 1 + age + BMI),
        @formula(Hb ~ 1),
        @formula(Hb ~ 1 + age + BMI),
        :nipt_id,
        pheno_data,
        vcffile,
        geneticformat = "VCF",
        vcftype = :DS,
        pvalfile,
        nullfile = fitted_null,
        covrowinds = covrowmask,
        geneticrowinds = geneticrowmask)
ERROR: LoadError: MethodError: no method matching trajgwas(::FormulaTerm{Term, Tuple{ConstantTerm{Int64}, Term, Term, Term, Term, Term, Term, Term, Term, Term, Term, Term, Term, Term}}, ::FormulaTerm{Term, ConstantTerm{Int64}}, ::FormulaTerm{Term, Tuple{ConstantTerm{Int64}, Term, Term, Term}}, ::Symbol, ::DataFrame, ::String, ::String; geneticformat="VCF", vcftype=:DS, nullfile="/share/home/lsy_yangzijing/trajgwas_xx/longgang_20230902/input/PLT.null.txt", covrowinds=Bool[1, 1, 1, 1, 1, 1, 1, 1, 1, 1  …  1, 1, 1, 1, 1, 1, 1, 1, 1, 1], geneticrowinds=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10  …  20084, 20085, 20086, 20087, 20088, 20089, 20090, 20091, 20092, 20093])
Closest candidates are:
  trajgwas(::FormulaTerm, ::FormulaTerm, ::FormulaTerm, ::Union{String, Symbol}, ::DataFrame, ::Union{Nothing, AbstractString}; nullfile, solver, solver_config, parallel, runs, verbose, init, kwargs...) at ~/.julia/packages/TrajGWAS/uUosN/src/gwas.jl:149
  trajgwas(::FormulaTerm, ::FormulaTerm, ::FormulaTerm, ::Union{String, Symbol}, !Matched::AbstractString, ::Union{Nothing, AbstractString}; covtype, covrowinds, kwargs...) at ~/.julia/packages/TrajGWAS/uUosN/src/gwas.jl:128
  trajgwas(::FormulaTerm, ::FormulaTerm, ::FormulaTerm, ::Union{String, Symbol}, ::DataFrame) at ~/.julia/packages/TrajGWAS/uUosN/src/gwas.jl:149 got unsupported keyword arguments "geneticformat", "vcftype", "nullfile", "covrowinds", "geneticrowinds"
  ...
Stacktrace:
 [1] top-level scope
   @ ~/trajgwas_xx/longgang_20230902/bin/1.null_model.jl:37
in expression starting at /share/home/lsy_yangzijing/trajgwas_xx/longgang_20230902/bin/1.null_model.jl:37

Line 37 of /share/home/lsy_yangzijing/trajgwas_xx/longgang_20230902/bin/1.null_model.jl is

trajgwas(@formula(Hb ~ 1 + age + BMI),
        @formula(Hb ~ 1),
        @formula(Hb ~ 1 + age + BMI),
        :nipt_id,
        pheno_data,
        vcffile,
        geneticformat = "VCF",
        vcftype = :DS,
        pvalfile,
        nullfile = fitted_null,
        covrowinds = covrowmask,
        geneticrowinds = geneticrowmask)

I'm not very familiar with the Julia language. Any wrong argument or did I miss some arguments when I using trajgwas()? I have a VCF file with more sample than my pheno file (I am sure all sample in pheno file is in VCF file). So I try to using covrowinds and geneticrowinds to deal with it. But I never succeed. When I kept the sample in the VCF file and the pheno file the same, everything went well.

kose-y commented 1 year ago

With the current code, you are not reaching where you had gotten the error before. It seems like you have accidentally changed a semicolon to a comma. While it is not always mandatory in Julia, a semicolon is necessary before the keyword arguments in trajgwas(), as it uses variable-length arguments internally. You need to change the two lines with vcffile, to vcffile;.