Closed lizhongliu1996 closed 5 years ago
To each data frame element of PCs_ls
(within SuperPCA_pVals()
and AESPCA_pVals()
), add the original row names of assayData_df
(if they exist).
Should we add a function that takes in sample-ID-labelled assay and phenotype data? We could then match/join the data ourselves within this function. This would also allow us to avoid rownames altogether.
This function is internal. Also, the getPathPCLs()
function returns gene rownames as needed.
I'm re-opening this issue: the gene names are added, but sample names are not, so still needs to add sample names
Also, would it be possible for getPathPCLs to return more than 1 PC?
So please add the following sentence to help file of getPathPCLs
Question:
The sample names is especially important for multiomics analysis. See below, the samples are ordered differently in inputing datasedt, without row names for the samples, it can be error prone to do multi-omics analysis
please also modify the sample columns so that they match in both datasets
For function SubsetPathwayData, could we also add rownames or an extra column for sample IDs?
CNVgene_df <- SubsetPathwayData(ovCNV_Surv, "path22")
this is what I get currently:
without sample IDs, it's difficult to merge data for multi-omics analysis
We will take in a data frame with sample IDs as the first column for the assay data, and a data frame with sample IDs as the first column of the response data. Then, we will internally inner join the two data frames, and preserve the sample IDs.
Gabriel is unhappy with this.
Find in a directory: CTRL / CMD + Shift + F
This is requiring even more effort than I had originally anticipated. I have to re-write all examples and testing scripts to account for the fact that we now require sample IDs for both assay and phenotype. This will take at least a full day to finish, and that doesn't even include the time to rebuild the OmicsPath
class to now take in the sample IDs or the time to propagate the sample IDs through the code.
I need to update the documentation across the board to mention that we require the response object to be a data frame.
@gabrielodom OK, I see. I also thought more about your concern on having createOmics merging datasets for users.
So how about we keep what we have now, except requiring the first column of the response
slot in CreateOmics
to be a variable called Sample
? this way, CreateOmics can extract sample IDs and pass it on to getPathPCLs
.
sampleIDs_char
has been added as a slot to all Omics*
objects. We additionally need get*()
and set*()
methods before we can move forward with returning these sample IDs.
I wrote the CheckSampleIDs()
function to help in object creation. Also, I had to make more edits to the CreateOmics()
function to support these Omics*
-slot changes.
Updates:
SubsetPathwayData()
function now returns a data frame with leading sample IDs columngetPathPCLs()
function now returns the PCs as a tidy data frame with leading sample IDs column and the loadings as a tidy data frame with leading feature IDs column@lizhongliu1996, I think this is finished. Please re-test your code. Note that you will have to supply the sample IDs as the first column of the assay and response data frames, as appropriate.
I test it with Proteo and CopyNumber dataset, both SubsetPathwayData()
and getPathPCLs()
function works, but do notice that when use CreateOmics()
, the argument assayData_df = dataset[, -(2:x)]
should start from 2nd column.
Yes, the sample ID column is now required for both the assay and response. Thanks!!
Sample IDs are lost in the LoadOntoPCs
function.
after run the function 'ExtractAESPCs', the PCs results lost the colnames and rownames of the assay_data