Closed GiadaLalli closed 1 year ago
ok, so: there's a ton of sh*t that fell from above during this weekend, and here it is a clearer explaination:
we always went blabbering that the library is PyTorch based but never actually used PyTorch implementation, so all the metrics needs to be "translated" into torch-runnable metrics; this was already possible for 3 of them (dot prod was already implemented with torch + pearson :
def __pearson_metric_t(first, second):
if (first.dim(), second.dim()) == (1, 1):
return t.corrcoef(t.tensor(first,second))[0,1]
combined = t.cat([first, second], axis=1)
return t.corrcoef(combined.T)[: first.shape[1] - 1, first.shape[1] :]
def __spearman_metric_t(first, second):
if (first.dim(), second.dim()) == (1, 1):
X = t.argsort(first)
Y = t.argsort(second)
combined = t.cat([X, Y], axis=1)
return t.corrcoef(combined.T)[0,1]
X = t.argsort(first, dim=0)
Y = t.argsort(second, dim=0)
combined = t.cat([X, Y], axis=1)
return t.corrcoef(combined.T)[: first.shape[1] - 1, first.shape[1] :]
we also still have the problem of not having LD for SNP-data (which means the unmapped, discrete); both LD and mutual_info should be usable only for that data type - moreover, I cannot find a way to "translate mutual_info" from sklearn to torch, it's basically too much work and it's not worth it, so I wanted to try and use these 2 metrics as "external metrics" but it doesn't work, so we'll have to figure out why
Daniele requested me to benchmark all the metrics (this shitty paper will be the end of me, I'll never be able to run all the things in a suitable time), which means: running Pearson/Spearman Scipy version + Mutual_info Sklearn version + Dot Prod Numpy version, LD Scikit-allel version VS a) all the metrics I have in torch (so Pearson, Spearman and Dot Prod) CPU speed b) all the metrics I have in torch (so Pearson, Spearman and Dot Prod) GPU speed - and this is a problem already moreover he want this done in a very specific way which will take me forever
cuda activation should be added as a parameter of the isn computation functions (I have no idea how to say this better, basically for activating the GPU for the metrics I need to add every damn time these lines that I post here as an example:
def dot_metric(first, second):
if torch.cuda.is_available():
dev = "cuda:0"
else:
dev = "cpu"
device = torch.device(dev)
first = first.to(device)
second = second.to(device)
return torch.matmul(first.permute(*torch.arange(first.ndim - 1, -1, -1)), second)
another problem worth mentioning: tractor doesn't work anymore; or better said: what last week took 5 seconds for computation, now takes 145 secs, which means that the benchmark analysis is completely f*cked up and I cannot say anymore that "tractor is the fastest bs for computing ISNs" as it would not be true
Closed by #30
this comment has been removed