Bioconductor / hca

Human Cell Atlas data discovery and retrieval
Other
3 stars 2 forks source link

Access the value in the list #35

Open shbrief opened 1 year ago

shbrief commented 1 year ago

Is there a way to extract/subset specific values from the generic list-of-list (not HCA data) that meet a particular condition?

mtmorgan commented 1 year ago

I'm not sure exactly what you're asking. There are are number of commands outlined on ?lol so for instance

> p = projects(as = "lol")
> p |> lol_pull("hits[*].projects[*].projectTitle") |> head()
[1] "1.3 Million Brain Cells from E18 Mice"
[2] "A Cellular Anatomy of the Normal Adult Human Prostate and Prostatic Urethra"
[3] "A Cellular Atlas of Pitx2-Dependent Cardiac Development."
[4] "A Human Liver Cell Atlas reveals Heterogeneity and Epithelial Progenitors"
[5] "A Protocol for Revealing Oral Neutrophil Heterogeneity by Single-Cell Immune Profiling in Human Saliva"
[6] "A Single-Cell Atlas of the Human Healthy Airways."

For more complicated actions one strategy is to create a tibble

tbl <- tibble(
    genusSpecies = lol_hits_lpull(p, "hits[*].donorOrganisms[*].genusSpecies[*]"),
    projectTitle = lol_hits_pull(p, "hits[*].projects[*].projectTitle")
)

and then manipulate, e.g.,

> list_equals = function(x, value) vapply(x, identical, logical(1), value)
> tbl |> filter(list_equals(genusSpecies, "Mus musculus"))
# A tibble: 22 × 2
   genusSpecies projectTitle
   <list>       <chr>
 1 <chr [1]>    1.3 Million Brain Cells from E18 Mice
 2 <chr [1]>    A Cellular Atlas of Pitx2-Dependent Cardiac Development.
 3 <chr [1]>    A revised airway epithelial hierarchy includes CFTR-expressing …
 4 <chr [1]>    A single-cell molecular map of mouse gastrulation and early org…
 5 <chr [1]>    Cross-Species Single-Cell Analysis of Pancreatic Ductal Adenoca…
 6 <chr [1]>    Defining the Activated Fibroblast Population in Lung Fibrosis U…
 7 <chr [1]>    High throughput error corrected Nanopore single cell transcript…
 8 <chr [1]>    Highly Parallel Genome-wide Expression Profiling of Individual …
 9 <chr [1]>    Massively Parallel Single Nucleus Transcriptional Profiling Def…
10 <chr [1]>    Melanoma infiltration of stromal and immune cells
# … with 12 more rows
# ℹ Use `print(n = ...)` to see more rows

(n.b., a different filter, list_contains = function(x, value) vapply(x,%in%, logical(1), x = value) to retrieve studies that might have involved mouse and other species).

Another option for more complicated processing might use the rjsoncons package to work with the 'list-of-lists', e.g., visualize with

as.list(p) |> listviewer::jsonedit()

and use JMESpath to query, e.g., project titles for all studies of mice

as.list(p) |> 
    ## convert the list to json
    jsonlite::toJSON(auto_unbox=TRUE) |>
    ## query json with JMESpath query (can be developed interactively
    ## in listviewer) This part `hits[?donorOrganisms[].genusSpecies[]
    ## == ['Mus musculus']]` says select all hits with genusSpecies
    ## equal to 'Mus musculus', and then `projects[].projectTitle`
    ## says to extract the projectTitle from those hits
    jmespath("hits[?donorOrganisms[].genusSpecies[] == ['Mus musculus']].projects[].projectTitle") 

with the result

 [1] "1.3 Million Brain Cells from E18 Mice"
 [2] "A Cellular Atlas of Pitx2-Dependent Cardiac Development."
 [3] "A revised airway epithelial hierarchy includes CFTR-expressing ionocytes"
 [4] "A single-cell molecular map of mouse gastrulation and early organogenesis"
 [5] "Cross-Species Single-Cell Analysis of Pancreatic Ductal Adenocarcinoma Reveals Antigen-Presenting Cancer-Associated Fibroblasts"
 [6] "Defining the Activated Fibroblast Population in Lung Fibrosis Using Single Cell Sequencing."
 [7] "High throughput error corrected Nanopore single cell transcriptome sequencing."
 [8] "Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets."
 [9] "Massively Parallel Single Nucleus Transcriptional Profiling Defines Spinal Cord Neurons and Their Activity during Behavior"
[10] "Melanoma infiltration of stromal and immune cells"
[11] "Molecular Architecture of the Mouse Nervous System."
[12] "Single cell transcriptional profiling of peripheral blood mononuclear cells (PBMCs) from mice flown on Rodent Research Reference Mission-2 (RRRM-2)"
[13] "Single cell transcriptional profiling of spleens from mice flown on Rodent Research Reference Mission-2"
[14] "Single cell transcriptomics reveals spatial and temporal dynamics of gene expression in the developing mouse spinal cord"
[15] "Single nucleus RNA-sequencing defines unexpected diversity of cholinergic neuron types in the adult mouse spinal cord."
[16] "Single-Cell Transcriptomics Reveals a Population of Dormant Neural Stem Cells that Become Activated upon Brain Injury."
[17] "Single-Cell Transcriptomics Uncovers Zonation of Function in the Mesenchyme during Liver Fibrosis"
[18] "Single-cell analysis of the cellular heterogeneity and interactions in the injured mouse spinal cord"
[19] "Single-cell transcriptomic analysis of the adult mouse spinal cord reveals molecular diversity of autonomic and skeletal motor neurons"
[20] "Tabula Muris: Transcriptomic characterization of 20 organs and tissues from Mus musculus at single cell resolution"
[21] "The emergent landscape of the mouse gut endoderm at single-cell resolution"
[22] "Transcriptomic Profiling of the Developing Cardiac Conduction System at Single-Cell Resolution."