BRANCHlab / metasnf

Scalable subtyping with similarity network fusion
https://branchlab.github.io/metasnf/
Other
5 stars 0 forks source link

Bug in generate_data_list between v0.2.0 and v0.2.1 #10

Closed apdlbalb closed 4 months ago

apdlbalb commented 4 months ago

Hello!

I've been getting this error when I run generate_data_list:

Error in convert_uids(data_list, uid) : 
  The specified original UID (subjectkey) is not present in this data list. Are you sure you spelled it correctly?

After some investigation, it looks like the issue is associated with the v0.2.1 update! In v.0.2.0, I get this feedback as expected

[1] "Existing `subjectkey` column will be treated as UID."

For reference, the lines prior to the function call are:

# Prior steps involve cutting down the raw dataset to fewer variables of interest, then imputing missing data
# Subset cases only, no controls
clinical_associations.pe <- clinical_associations.imputed[scope_raw.na$f34_pet == 1,] 

# Generate patient IDs
subjectkey <- paste0("PE",1:nrow(clinical_associations.pe))
clinical_associations.pe <- cbind(subjectkey, clinical_associations.pe)

# metaSNF data list
data_list.pe <- generate_data_list(
  list(clinical_associations.pe[,c("subjectkey", names(fetal_health))], "fetal_health", "fetus", type = "mixed"),
  list(clinical_associations.pe[,c("subjectkey", names(cvd_risk))], "cvd_risk", "mother", type = "mixed"),
  list(clinical_associations.pe[,c("subjectkey", names(placenta))], "placenta", "fetus", type = "mixed"),
  list(clinical_associations.pe[,c("subjectkey", names(delivery))], "delivery", "pregnancy", type = "discrete"),
  list(clinical_associations.pe[,c("subjectkey", names(maternal_health))], "maternal_health", "mother", type = "mixed"),
  list(clinical_associations.pe[,c("subjectkey", names(immune_activation))], "immune_activation", "mother", type = "mixed"),
  uid = "subjectkey"
)
pvelayudhan commented 4 months ago

Hello!

In v0.2.0, the parameter was called old_uid. In v0.2.1, it was changed to uid. Mixing them up in either version won't flat out lead to a parameter error because the function will accept pretty much anything through .... Do you think that could be causing this issue?

Also, would you be able to check if this issue persists in the latest commit?

If things still aren't working out, would you mind sending me a minimal toy version of your dataset that can lead to this error?

apdlbalb commented 4 months ago

Found it!! With the latest version, the function runs smoothly if I don't explicitly write type =. The error can be reproduced like this:

> test <- data.frame(subjectkey, a = 1:278)
> dl <- generate_data_list(list(test, "a","a",type = "discrete"), uid = "subjectkey")
Error in convert_uids(data_list, uid) : 
  The specified original UID (subjectkey) is not present in this data list. Are you sure you spelled it correctly?

> dl <- generate_data_list(list(test, "a","a","discrete"), uid = "subjectkey")
[1] "Existing `subjectkey` column will be treated as UID."

Thank you for always getting back so quickly!

pvelayudhan commented 4 months ago

Ooooo.... that's an ugly bug!

Working on it now, thanks for spotting this!

pvelayudhan commented 4 months ago

Problem is fixed in version 0.4.2 (latest).

generate_data_list intentionally was supposed to error on partial component naming, but that error wasn't triggering properly. Specifically line 127 here: https://github.com/BRANCHlab/metasnf/commit/0ba3675cf850996a0923b76a8af0db83f914d26c#diff-6b672bdc19d81ec20cb23ac6808e4aa0234a5de6a7250f2cb80aabc1df1e52dbL127