Closed EliHei closed 5 years ago
Hi @EliHei,
One of the primary utilities of storing the data in GCTX format (which is based on HDF5) is that it allows one to slice out specific rows/columns from the matrix so it is not necessary to store the entire matrix in RAM. Please see ?parse.gctx
and the cmapR tutorial for more details on how to specify the specific rows/columns of interest. Could you please describe in more detail what you'd like to do with the data?
Thanks a lot, Ted
@tnat1031
Thanks for your response. In order to train a neural network, I want data as a dataframe.
@EliHei
Ok, in that case you may need to do some manual wrangling of the data. You could parse a subset of the file as a GCT object and then convert the matrix to a data.frame (using as.data.frame
or something similar). I'd recommend prototyping that on a subset of the data first to get a sense for whether the inputs are correct and the NN results are making sense. Once you've got that working, you could try scaling up using something like sparkR, though I must admit I haven't tried it myself and am not sure whether is supports NN directly.
Could you tell us more about what you'd like the NN to do?
Thanks, Ted
Thanks @tnat1031!
I want to use NN in order to check something about PCLs (as a supervised task). I have tried sparkR, manipulating/merging subsets of data of Level 2, but still have some problems according to my limited ram. Do you have any idea, how to handle data from level 2 for normalizing and omitting NAs? (I mean when data is in GCT format.)
I want to use NN in order to check something about PCLs (as a supervised task).
Good idea.
But why don't you start with level 4 or level 5 data as provided rather than descend to more primitive forms.
On Tue, Apr 3, 2018 at 10:52 AM, Elyas Heidari notifications@github.com wrote:
Thanks @tnat1031 https://github.com/tnat1031!
I want to use NN in order to check something about PCLs (as a supervised task). I have tried sparkR and manipulating/merging subsets of data of Level 2, but still have some problems according to my limited ram. Do you have any idea, how to handle data from level 2 for normalizing and omitting NAs? (I mean when data is in GCT format.)
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cmap/cmapR/issues/16#issuecomment-378278324, or mute the thread https://github.com/notifications/unsubscribe-auth/AA381gDTzLNFrgSFfU0HmHsA0-eBPRBDks5tk4yrgaJpZM4TEcs0 .
I prefer to use only Landmark genes. Therefore, I need to normalize expression data of landmark genes, myself. @in4matx
You can still extract just the lm genes from the provided level 4 or 5 data.
Note that normalization doesn't in itself depend on inferred or not.
Anyway, just suggesting a path by which you can get to your core use case (prediction) quicker.
Good luck.
aravind
On Tue, Apr 3, 2018 at 11:10 AM, Elyas Heidari notifications@github.com wrote:
I prefer to use only Landmark genes. Therefore, I need to normalize expression data of 978 landmark genes, myself. @tnat1031 https://github.com/tnat1031
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/cmap/cmapR/issues/16#issuecomment-378284709, or mute the thread https://github.com/notifications/unsubscribe-auth/AA381mvqma_R3fPgI4c-XgodnMDvCHKwks5tk5DPgaJpZM4TEcs0 .
@in4matx Thank you very much!
So for example, in level 3 (or 4/5) of data, the normalized expression of lm genes is just according to lm genes rather than both lm and inferred genes. Am I right? (if so, I will be very happy!)
So for example, in level 3 (or 4/5) of data, the normalized expression of lm genes is just according to lm genes rather than both lm and inferred genes. Am I right? (if so, I will be very happy!)
Yes, to confirm, inference or the lack thereof has no impact on normalization of the landmarks. Extract and use those from 4/5
On Tue, Apr 3, 2018 at 11:34 AM, Elyas Heidari notifications@github.com wrote:
@in4matx https://github.com/in4matx Thank you very much!
So for example, in level 3 (or 4/5) of data, the normalized expression of lm genes is just according to lm genes rather than both lm and inferred genes. Am I right? (if so, I will be very happy!)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cmap/cmapR/issues/16#issuecomment-378293543, or mute the thread https://github.com/notifications/unsubscribe-auth/AA381nLBk-zEowILf9d0PaitzcPyxVmvks5tk5aQgaJpZM4TEcs0 .
Hi,
I want to read whole matrix of level 3 of LINCS dataset, but it's to large to fit to my RAM. I was wondering if there is any trick (e.g., spark) to handle such large data.