FredHutch / templaining

A place to coordinate needs/feedback/changes between Sage Bionetworks and the Translational Genomics Group @ FH
1 stars 0 forks source link

synQuery/synGetAnnotations in synapser #4

Open vortexing opened 6 years ago

vortexing commented 6 years ago

Our reason for using R to do all this work is that often our data is structured very nicely for use with dplyr. Thus, while the R client is a wrapped python client, and we understand that, the entire reason we want to use R is b/c our data manipulations are better done in tabular format rather than lists/dictionaries. So anyone using synapser would expect it to behave friendlier with R-native approaches to data frames and such.

When doing something like this:

a<- synQuery(paste0("SELECT * FROM file WHERE parentId=='",subfolders$results[[1]]$folder.id,"'"))

Instead of what comes down being a list of two lists, and in one of those lists is the number of results (which is irrelevant b/c I can get that info from the length of the other list), and the other list is a list of list of lists, it would be far more useful to have ideally a data frame be a list of all the associated synapse stuff I usually disregard, and a data frame where the columns are the values in the * part of the query, and the rows are synIDs. Then I could pull out just the data frame to use.

synGetAnnotations also works silly like this. This is what I ended up doing to get a table of all the annotations on a list of synIDs in a project (which happened to be all the synID's in the project).

allAnn <- lapply(frames$synID, function(x) synGetAnnotations(x))
names(allAnn) <- frames$synID
innerMelt <- ldply(allAnn, function(x) melt(x))
innerMelt$L2 <- NULL
innerMelt <- rename(innerMelt, synID = .id, Value = value, Annotation = L1)
hereWeGo <- spread(innerMelt, Annotation, Value)
hereWeGo <- merge(hereWeGo, frames)

That is dumb. Help me do it better. The behavior of this is ideal:

commonsSyn <- synTableQuery("Select * from syn11948490", resultsAs = "csv",
                            includeRowIdAndRowVersion=F)
commonsSyn <- commonsSyn$asDataFrame()
vortexing commented 6 years ago

Here's another example of behavior that seems odd in the context of R. I just want to have a data frame containing id, name as columns and synID and name for every folder with the same parentId. What I get is a list of 2, one of which is a list of a list of two. So.Silly.
screen shot 2018-04-10 at 8 52 14 am

vortexing commented 6 years ago

This is the worst - when the syn id is at a different level in the list than the key-value pairs of the annotations on the entity. I just want a data frame in this case with the columns being id, molecular_id, and processedContentType and there would be 1 row per entity in that folder.

screen shot 2018-04-10 at 9 20 53 am

vortexing commented 6 years ago

This is the required fix for the above comment to get it in the right shape.
screen shot 2018-04-10 at 9 30 33 am

meredithslota commented 6 years ago

https://sagebionetworks.jira.com/browse/SYNR-1177 - can you see this ticket? If so, does the fix here address what you are talking about?

vortexing commented 6 years ago

To be honest, I actually don't know if it does or not. Perhaps it is just Friday afternoon. I will re-visit after the synapser release. ;)

teslajoy commented 6 years ago

Thanks - Quick note... I forgot to mention that synQuery() has been a deprecated function since synapse python client version 1.7. Now we are working to generate a warning message that would redirect the users to the new function.