QData / DeepChrome

Bioinformatics16: DeepChrome: Deep-learning for predicting gene expression from histone modifications
http://deepchrome.net
Apache License 2.0
62 stars 14 forks source link

How to get the predictions for each gene? #10

Open dgarrimar opened 2 years ago

dgarrimar commented 2 years ago

Hi,

I ran the pipeline on my data smoothly, and got the ROC AUC in the train and test sets. However, I am not very familiar with torch/lua. How could I obtain the final predictions for each gene in the test set (either the 0/1 label or better the probablity [0,1])?. I guess this means just adding/modifying a couple of lines of code.

thanks!

PS. I'd be great too if I could obtain the accuracy/confusion matrices for the test set (not only the ROC AUC)

jacklanchantin commented 2 years ago

The unnormalized outputs will be in the output variable here

You can append a nn.SoftMax module to the model in order to get normalized probabilities.

btw - have you tried the AttentiveChrome pytorch code in the repository? It's likely much easier to follow.

dgarrimar commented 2 years ago

I am still not sure on how to do this in lua, you mean something like: local ex = nn.SoftMax(output) and then print ex to a file? I am not familiar with lua objects. Which kind of object is output ? It seems it is not just a number. The same for ex. Could you please give me some more hints on how to store the actual numbers in a file? Thanks a lot! I also had a look at the pytorch code, but it seems to be much slower on the same dataset (I am using CPUs for now).

jacklanchantin commented 2 years ago

you can do normalized_output = nn.SoftMax()(output)

output is a torch tensor normalized_output[:,0] = p(x=true) normalized_output[:,1] = p(x=false)

you can write each of these to a csv file using standard lua write to file methods.

dgarrimar commented 2 years ago

Uhm, for some reason it complains: unexpected symbol near ':'. (I just copy/pasted)

jacklanchantin commented 2 years ago

I don't remember what dimension normalized_output would be. Can you try removing the :, ?

dgarrimar commented 2 years ago

same :( (')' expected near '=')

jacklanchantin commented 2 years ago

oh you shouldn't use = p(x=true), i was explaining what those will give you - i.e. the probability that the input is has expression=true

dgarrimar commented 2 years ago

I see haha, but still normalized_output:nDimension() is 1

dgarrimar commented 2 years ago

OK I think I got it :), normalized_output[1] should give the probability