casbap / ncRNA

0 stars 1 forks source link

melt problem #1

Closed markziemann closed 2 years ago

markziemann commented 2 years ago

https://github.com/casbap/ncRNA/blob/c5533e06dd70bac608dba7eb0f75598098e0e0e4/HumanDataPrep.Rmd#L172

Getting an error with this line:

> correlate_melt <- reshape2::melt(correlate, na.rm = TRUE)
Error in if (n > 0) c(NA_integer_, -n) else integer() : 
  missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In rep.fac * nx : NAs produced by integer overflow
2: In .set_row_names(as.integer(prod(d))) :
  NAs introduced by coercion to integer range
> correlate[1:10,1:6]
                ENSG00000223972 ENSG00000227232 ENSG00000278267 ENSG00000243485
ENSG00000223972              NA              NA              NA              NA
ENSG00000227232       0.8777559              NA              NA              NA
ENSG00000278267       0.9981221       0.8813181              NA              NA
ENSG00000243485       0.9999483       0.8784370       0.9981788              NA
ENSG00000237613       0.9998932       0.8785271       0.9981822       0.9999349
ENSG00000240361       0.9999499       0.8769497       0.9981705       0.9999616
ENSG00000186092       0.9999499       0.8772435       0.9981738       0.9999616
ENSG00000238009       0.9951558       0.8729852       0.9928406       0.9951992
ENSG00000233750       0.9991170       0.8785839       0.9972674       0.9992271
ENSG00000268903       0.8940378       0.8322956       0.8932349       0.8942898
                ENSG00000237613 ENSG00000240361
ENSG00000223972              NA              NA
ENSG00000227232              NA              NA
ENSG00000278267              NA              NA
ENSG00000243485              NA              NA
ENSG00000237613              NA              NA
ENSG00000240361       0.9999249              NA
ENSG00000186092       0.9999249       0.9999967
ENSG00000238009       0.9951425       0.9951926
ENSG00000233750       0.9991871       0.9991721
ENSG00000268903       0.8942030       0.8936922
markziemann commented 2 years ago

possible solution from https://stackoverflow.com/questions/49034322/melt-correlation-matrix-in-r

correlate_melt <- melt(replace(correlate, lower.tri(correlate, TRUE), NA), na.rm = TRUE)
markziemann commented 2 years ago

it works when correlate is subset to 10k x 10k so I suspect the code is okay but some limit is hit

> correlate_melt <- reshape2::melt(correlate2, na.rm = TRUE)
> correlate2 <- correlate[1:40000,1:40000]
> correlate_melt <- reshape2::melt(correlate2, na.rm = TRUE)
> correlate2 <- correlate[1:50000,1:50000]
> correlate_melt <- reshape2::melt(correlate2, na.rm = TRUE)
Error in if (n > 0) c(NA_integer_, -n) else integer() : 
  missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In rep.fac * nx : NAs produced by integer overflow
2: In .set_row_names(as.integer(prod(d))) :
  NAs introduced by coercion to integer range
markziemann commented 2 years ago

Low budget solution:

> correlate1 <- correlate[1:30000,]
> correlate2 <- correlate[30001:nrow(correlate),]
> correlate_melt1 <- reshape2::melt(correlate1, na.rm = TRUE)
> correlate_melt2 <- reshape2::melt(correlate2, na.rm = TRUE)
> correlate_melt <- rbind(correlate_melt1, correlate_melt2)
> remove(c(correlate_melt1,correlate_melt2))
markziemann commented 2 years ago

This actually fixed it. I suspect that melting huge matrices like this requires integers greater than the max allowable.

casbap commented 2 years ago

This actually fixed it. I suspect that melting huge matrices like this requires integers greater than the max allowable.

I know the solution worked, but is there anyway we can split this into multiple matrices instead?