chr1swallace / coloc

Repo for the R package coloc
143 stars 44 forks source link

Error when running `check_dataset()` #145

Closed martaalcalde closed 8 months ago

martaalcalde commented 8 months ago

When running check_dataset() with argument req = "LD", it keeps appearing the error colnames in LD do not contain all SNP. I checked the function, and it is because the following criteria is not fulfilled: length(setdiff(d$snp,colnames(d$LD))). This is not meet as my d$snp is with the form rs1 whereas when computing the LD matrix, either row names and column names are with the form rs1_EffectAlleleValue_OtherAlleleValue.

I think it would be nice to generalise this to be able to include LD matrix that in their row names and column names include the alleles. I suggest just changing:

length(setdiff(d$snp,colnames(d$LD))) ---> length(d$snp) == length(colnames(d$LD))

And maybe, afterwards adding:

if(length(setdiff(d$snp,colnames(d$LD)))){
colnames(d$LD) <- gsub("_.*","",colnames(d$LD))
row.names(d$LD) <- gsub("_.*","",row.names(d$LD))

if(length(setdiff(d$snp,colnames(d$LD))){
Error("colnames in LD do not contain all SNP")
}
}

I hope is useful!

chr1swallace commented 8 months ago

Hi Maria,

Thank you for the suggestion, but I think it is better you make the row/colnames of your LD matrix match the elements of your SNP vector. Coloc uses these to ensure the LD is correctly ordered against the beta etc

The gsub you suggest would do it, but I can't incorporate that into the function, because other people may have different forms of SNP names where they do not wish the part after the _ to be deleted.

Chris

-- https://chr1swallace.github.io


From: Marta Alcalde-Herraiz @.> Sent: Monday, January 22, 2024 1:16:25 PM To: chr1swallace/coloc @.> Cc: Subscribed @.***> Subject: [chr1swallace/coloc] Error when running check_dataset() (Issue #145)

When running check_dataset() with argument req = "LD", it keeps appearing the error colnames in LD do not contain all SNP. I checked the function, and it is because the following criteria is not fulfilled: length(setdiff(d$snp,colnames(d$LD))). This is not meet as my d$snp is with the form rs1 whereas when computing the LD matrix, either row names and column names are with the form rs1_EffectAlleleValue_OtherAlleleValue.

I think it would be nice to generalise this to be able to include LD matrix that in their row names and column names include the alleles. I suggest just changing:

length(setdiff(d$snp,colnames(d$LD))) ---> length(d$snp) == length(colnames(d$LD)

And maybe, afterwards adding:

if(length(setdiff(d$snp,colnames(d$LD)))){ colnames(d$LD) <- gsub(".*","",colnames(d$LD)) row.names(d$LD) <- gsub(".*","",row.names(d$LD))

if(length(setdiff(d$snp,colnames(d$LD))){ Error("colnames in LD do not contain all SNP") } }

I hope is useful!

— Reply to this email directly, view it on GitHubhttps://github.com/chr1swallace/coloc/issues/145, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAQWR2CCERYYDAJJ6CEYURTYPZRCTAVCNFSM6AAAAABCFGEOD2VHI2DSMVQWIX3LMV43ASLTON2WKOZSGA4TGOBZHE3TCNI. You are receiving this because you are subscribed to this thread.Message ID: @.***>

martaalcalde commented 8 months ago

Hi Chris,

Thank you for your fast response. Makes sense! I'll close the issue.

All the best,

Marta

chr1swallace commented 8 months ago

Thanks for understanding

-- https://chr1swallace.github.io


From: Marta Alcalde-Herraiz @.> Sent: Monday, January 22, 2024 4:39:15 PM To: chr1swallace/coloc @.> Cc: Chris Wallace @.>; Comment @.> Subject: Re: [chr1swallace/coloc] Error when running check_dataset() (Issue #145)

Hi Chris,

Thank you for your fast response. Makes sense! I'll close the issue.

All the best,

Marta

— Reply to this email directly, view it on GitHubhttps://github.com/chr1swallace/coloc/issues/145#issuecomment-1904385860, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAQWR2ELSWLNP67IRBSZ2PLYP2I3HAVCNFSM6AAAAABCFGEOD2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBUGM4DKOBWGA. You are receiving this because you commented.Message ID: @.***>