DataBiosphere / azul

Metadata indexer and query service used for AnVIL, HCA, LungMAP, and CGP
Apache License 2.0
7 stars 2 forks source link

Sorting by projectDescription fails to sort all results #6275

Open dsotirho-ucsc opened 6 months ago

dsotirho-ucsc commented 6 months ago

Sorting by projectDescription fails to properly sort all results.

$ http 'https://service.azul.data.humancellatlas.org/index/projects?size=30&catalog=dcp37&order=asc&sort=projectDescription&filters=%7B%7D' | jq '.hits[].projects[].projectDescription | .[0:100]'
"A single-nucleus cross-tissue molecular reference map was generated from frozen samples deriving fro"
"Liver transplantation (LT) is the standard therapy for the patients with end-stage liver disease. Th"
"Massively parallel single-cell RNA-seq technology (Drop-Seq) was applied to analyze the transcriptom"
"Mesenchymal and endothelial cells were cell-sorted from healthy human (adult) livers (n=20), then si"
"Milk derived live mammary epithelial cells were characterized by single cell RNA sequencing. Overall"
"Our aims are to generate a full representation of human hematopoiesis in blood and bone marrow of hu"
"Reprogramming, CRISPR/Cas9 gene-editing, organoid technologies and single-cell RNA sequencing techno"
"Single Cell RNAseq of primary pulmonary endothelial cells."
"Single cell sequencing of human airway epithelium from normal and cystic fibrosis lungs"
"Single-cell RNA sequencing of PBMCs and infusion products from 32 patients treated with CD19 CAR-T t"
"Single-cell RNA sequencing of cerebrospinal fluid and PBMCs from Multiple sclerosis patients."
"Single-cell RNA sequencing was applied to identify distinct and dynamic cell populations, including "
"Single-cell RNA sequencing was used to determine the cell type composition of all major human organs"
"Single-cell RNA-sequencing of the nasal mucosa across the lifespan Overall design: Single-cell RNA-s"
"Single-cell whole transcriptome analysis of  chronic myeloid leukemia stem cells, defined as Lin-CD3"
"Single-nucleus RNA sequencing of human cortex affected by multiple sclerosis cortical lesions."
"The authors developed a systematic toolbox for profiling fresh and frozen clinical tumor samples usi"
"The authors generated single-cell atlases of 23 lung, 16 kidney, 15 liver and 18 heart COVID-19 auto"
"The authors used RNA sequencing of half a million single cells to create a detailed census of cell t"
"The datasets contain single cell RNA sequencing data from the normal region of tumor-nephrectomy sam"
"The transcriptomes of microglia were investigated at a single cell level in non-demented elderly and"
"To characterize the cellular diversity in the human kidney cortical nephrogenic niche we dissociated"
"To study the developmental process of human podocytes and compare to the in vitro counterpart, we di"
"We isolated single cells (using either liberase or collagenase A dissociation) from subcutaneous adi"
"We used single-cell RNAseq to investigate the heterogeneity of glioblastoma tumors and assess differ"
"mRNA profiles for thousands of cells from foveal and parafoveal retina were generated from four huma"
"The purpose of this project is to assess the relevance of pluripotent stem cell-derived cerebral and"
"Respiratory failure associated with COVID-19 has placed focus on the lungs. Here, we present single-"
"Single cell genomics are revolutionizing our ability to characterize complex tissues. By contrast, t"
"Mouse studies have been instrumental in forming our current understanding of early cell-lineage deci"

The errors are easier to see in a descending order sort.

$ http 'https://service.azul.data.humancellatlas.org/index/projects?size=30&catalog=dcp37&order=desc&sort=projectDescription&filters=%7B%7D' | jq '.hits[].projects[].projectDescription | .[0:100]'
"The cellular landscape of the human intestinal tract is dynamic throughout life, developing in utero"
"The functions of innate lymphoid cells (ILCs) in immune system are increasingly appreciated, whereas"
"Recent advances in single cell genomics technologies have facilitated studies on the developing immu"
"Aging is accompanied by a loss of muscle mass and function, termed sarcopenia, causing numerous morb"
"The ovary is perhaps the most dynamic organ in the human body, only rivaled by the uterus. The molec"
"Severe coronavirus disease 2019 (COVID-19) pneumonia survivors often exhibit long-term pulmonary seq"
"Development of the human intestine is not well understood. Here we link single-cell RNA sequencing a"
"Intestinal mesenchymal cells play essential roles in epithelial homeostasis, matrix remodeling, immu"
"Although the function of the mammalian pancreas hinges on complex interactions of distinct cell type"
"During early human pregnancy the uterine mucosa transforms into the decidua, into which the fetal pl"
"Despite rapid developments in single cell sequencing technology, sample-specific batch effects, dete"
"Aberrant tissue-immune interactions are the hallmark of diverse chronic lung diseases. Here, we soug"
"During tumour growth cancer cells are subject to and selected by microenvironmental stress. The sele"
"Innate lymphoid cells (ILCs) are tissue-resident lymphocytes subdivided into ILC1s, ILC2s and ILC3s "
"NKT cells are potent immune regulators and skew immune responses toward either inflammation or toler"
"Pediatric-onset colitis and inflammatory bowel disease (IBD) have significant effects on the growth "
"Definitive haematopoiesis in the fetal liver supports self-renewal and differentiation of haematopoi"
"The invasive potential of gastric cancers (GCs) defines the hallmarks of malignancies; however, the "
"The human immune system displays substantial variation between individuals, leading to differences i"
"Current efforts within the HCA are largely focused on defining reference human cell types using a re"
"Single cell profiling is a powerful tool for studying the molecular and cellular (dys)function of ac"
"Rationale: The respiratory tract constitutes an elaborated line of defense that is based on a unique"
"The human cortex comprises diverse cell types that emerge from an initially uniform neuroepithelium "
"BackgroundRenal cell carcinoma (RCC) is the most common type of kidney cancer. Studying the pathogen"
"T helper (Th)17 cells are considered to contribute to inflammatory mechanisms in diseases such as mu"
"Temporal resolution of cellular features associated with a severe COVID-19 disease trajectory is nee"
"Despite effective treatment, HIV can persist in latent reservoirs, which represent a major obstacle "
"Preeclampsia (PE) is characterized by sustained hypertension and proteinuria at a gestational age of"
"Myocardial infarction is a leading cause of mortality worldwide. While advances in the acute treatme"
"BM samples were collected from two adult healthy donors and two AA patients at the first hospital af"
achave11-ucsc commented 6 months ago

Spike to diagnose.

dsotirho-ucsc commented 6 months ago

Our ES config value https://www.elastic.co/guide/en/elasticsearch/reference/7.17/ignore-above.html is currently set at 256, which is causing project descriptions longer than 256 characters not to be used in the sort.

Note how project descriptions less than 256 are sorted correctly, then those longer are sorted with the secondary sort key (entityId)

❯ http 'https://service.azul.data.humancellatlas.org/index/projects?size=30&catalog=dcp37&order=asc&sort=projectDescription&filters=%7B%7D' | jq '.hits[].projects[].projectDescription | .[0:256]'
"A single-nucleus cross-tissue molecular reference map was generated from frozen samples deriving from multiple organ types obtained from the GTEx Project."
"Liver transplantation (LT) is the standard therapy for the patients with end-stage liver disease. The authors constructed a single-cell transcriptomic atlas of 58,243 liver cells from 4 donors and 4 recipient liver transplant patients."
"Massively parallel single-cell RNA-seq technology (Drop-Seq) was applied to analyze the transcriptome of 26,677 pancreatic islets cells from both healthy and type II diabetic (T2D) donors."
"Mesenchymal and endothelial cells were cell-sorted from healthy human (adult) livers (n=20), then single cell RNA sequencing was performed using the 10X Genomics Chromium platform"
"Milk derived live mammary epithelial cells were characterized by single cell RNA sequencing. Overall design: Cells were pelleted from mid-feed milk samples prior to cryopreservation and subsequent analysis by single cell RNA sequencing."
"Our aims are to generate a full representation of human hematopoiesis in blood and bone marrow of humans using a multi-tier and iterative collection and analysis of 200,000 cells from ten healthy human bone"
"Reprogramming, CRISPR/Cas9 gene-editing, organoid technologies and single-cell RNA sequencing technology were combined to study the nephron lineage in a human context."
"Single Cell RNAseq of primary pulmonary endothelial cells."
"Single cell sequencing of human airway epithelium from normal and cystic fibrosis lungs"
"Single-cell RNA sequencing of PBMCs and infusion products from 32 patients treated with CD19 CAR-T therapy"
"Single-cell RNA sequencing of cerebrospinal fluid and PBMCs from Multiple sclerosis patients."
"Single-cell RNA sequencing was applied to identify distinct and dynamic cell populations, including an in-depth analysis of the fibroblast population, in lung fibrosis."
"Single-cell RNA sequencing was used to determine the cell type composition of all major human organs and construct a basic scheme for the human cell landscape (HCL)."
"Single-cell RNA-sequencing of the nasal mucosa across the lifespan Overall design: Single-cell RNA-sequencing of the nasal mucosa across the lifespan"
"Single-cell whole transcriptome analysis of  chronic myeloid leukemia stem cells, defined as Lin-CD34+CD38-."
"Single-nucleus RNA sequencing of human cortex affected by multiple sclerosis cortical lesions."
"The authors developed a systematic toolbox for profiling fresh and frozen clinical tumor samples using scRNA-Seq and snRNA-Seq."
"The authors generated single-cell atlases of 23 lung, 16 kidney, 15 liver and 18 heart COVID-19 autopsy donor tissue samples, and spatial atlases of 14 lung donors."
"The authors used RNA sequencing of half a million single cells to create a detailed census of cell types in the mouse nervous system."
"The datasets contain single cell RNA sequencing data from the normal region of tumor-nephrectomy samples."
"The transcriptomes of microglia were investigated at a single cell level in non-demented elderly and Alzheimers Disease donors using acute human post-mortem cortical brain samples."
"To characterize the cellular diversity in the human kidney cortical nephrogenic niche we dissociated cells from the cortex and performed 10X Genomics single-cell RNA sequencing."
"To study the developmental process of human podocytes and compare to the in vitro counterpart, we dissociated cells from the inner and outer kidney cortex as well as kidney organoids, and performed 10X Genomics single-cell RNA sequencing."
"We isolated single cells (using either liberase or collagenase A dissociation) from subcutaneous adipose tissue collected from abdominoplasty surgery, and healthy liver tissue adjacent to tumor from liver resection and ran 10X v2 3' scRNA Seq."
"We used single-cell RNAseq to investigate the heterogeneity of glioblastoma tumors and assess differential expression between cells within and in proximity of the tumor."
"mRNA profiles for thousands of cells from foveal and parafoveal retina were generated from four human donor eyes (single-cell donors 5-8) using 10X Genomics Chromium single-cell system followed by sequencing on an Illumina NovaSeq."
"The purpose of this project is to assess the relevance of pluripotent stem cell-derived cerebral and liver organoids to recapitulate the variation in cell-type specific gene expression programs between individuals. Towards this aim, we will generate refere"
"Respiratory failure associated with COVID-19 has placed focus on the lungs. Here, we present single-nucleus accessible chromatin profiles of 90,980 nuclei and matched single-nucleus transcriptomes of 46,500 nuclei in non-diseased lungs from donors of ~30 w"
"Single cell genomics are revolutionizing our ability to characterize complex tissues. By contrast, the techniques used to analyze the renal biopsy are little changed over the past several decades. In this study we have tested the hypothesis that single cel"
"Mouse studies have been instrumental in forming our current understanding of early cell-lineage decisions; however, similar insights into the early human development are severely limited. Here, we present a comprehensive transcriptional map of human embryo"
achave11-ucsc commented 6 months ago

Spike for design.

achave11-ucsc commented 3 months ago

A simple solution may be to bump the ignore_above limit to a value that meets our needs. In the documentation linked in Daniel's comment above, it states:

This option is also useful for protecting against Lucene’s term byte-length limit of 32766.

No additional justifications for leveraging this setting are given. In the same documentation page, it states that …

The ignore_above setting can be updated on existing fields using the update mapping API.

… implying that we may also dynamically update this setting based on our needs, however this may not be as efficient because of the additional round-trip to ES, and it may not be trivial to implement.

Alternatively, we may add an additional field to be indexed in the project document, with a value consisting of the first sentence/s of the project description. We may then use this additional field under the hood to power the sorting of the projectDescription facet and swap it for the full contents of the project description (which is also indexed but no longer used for sorting) at response time and/or drop the additional, internal, field.

An observation, for some of the first five hits in https://service.azul.data.humancellatlas.org/index/projects?size=30&catalog=dcp40&order=desc&sort=projectDescription&filters=%7B%7D, there seems to be an attempt to provide a 'One-Sentence Summary' or 'Overall design' that condenses the project description into a single sentence, but it's not a standardized process.

dsotirho-ucsc commented 3 months ago

Assignee to consider next steps.

hannes-ucsc commented 2 months ago

Spike to determine a distribution of project description length in the most recent DCP catalog in prod.

achave11-ucsc commented 2 months ago

The project_description_lengths dictionary (below) was used to determine the following distribution: projectDescriptions dist

project_description_lengths Produced by requesting … ``` 'https://service.azul.data.humancellatlas.org/index/projects?size=30&catalog=dcp41&order=desc&sort=projectDescription&filters=%7B%7D' ``` and extracting the `projectDescription` field in each hit to compute it's length. The dictionary keys are the hit's `projectShortname`. This `next` token in each request was followed until all 462 projectDescription lengths were extracted.

``` project_description_lengths = { 'IntestinalSpaceTime': 1391, 'fetalInnateLymphoid': 1550, 'DevelopingImmuneSystem': 1981, 'skeletalMuscleAging': 1445, 'Human5Ovary10xChuva': 1000, 'Covid19LungSequelae': 1297, 'spatiotemporalIntestinalDev': 1145, 'HumanColonicMesenchymeIBD': 1119, 'HumanMousePancreas': 1207, 'Fetal/Maternal Interface': 1220, 'Multiplexed scRNA-seq with barcoded antibodies': 1060, 'LungStromaEmphysema': 1075, 'THP1StimulationCells': 1088, 'InnateLymphoidCells': 641, 'nktpbmcZhou': 548, 'PediatricOnsetColitisAndIbdProfiling': 1078, 'FetalLiverHaematopoiesis': 1143, 'humanDiffuseTypeGastricCancerAtlas': 1163, 'eQTLAutoimmune': 978, 'AIDA': 1347, 'CITEseqPBMCProject': 1116, 'healthyAirwaysAtlas': 1777, 'dcp1219560702433721': 1086, 'Cheng-Human-10x3pv3': 1734, 'humanOligodendrocytesCulture': 2236, 'LongitudinalMultiomicsCovid19': 551, 'TranscriptionalHeterogeneityInLatentAndReactivatedHivInfectedCells': 588, 'immuneProfileOfPreeclampsia': 1441, 'myocardialInfarction': 881, 'boneMarrowAnemia': 632, 'MammaryLactactingCells': 1151, 'KidneyFibroticMicroenvironment': 1263, 'atlasOfTestisAging': 427, 'GSE67833_neural_stem_cells': 1144, 'CD8ActivationCOPDVillasenor': 1683, 'EastAsianPancreaticIslets': 1846, 'Milich2021MouseSpinal': 1036, 'LungMapVisium': 1511, 'CD4_T_cells_in_brain': 1004, 'KidneyOrganoidsPlasticity': 650, 'IdhWildtypeGlioblastomaDiversity': 1072, 'SpatialMapSkin': 1942, 'Liang_Human_Mouse_IPFlung_10x': 721, 'Chun-Human-Dropseq': 1152, 'earlyHumanEmbryogenesisAtlas': 1233, 'DendriticCellActivationHCM': 1585, 'GSE117211_KidneyMicroOrganoids': 720, 'NucSeqOfHumanEyes': 993, 'Tabula Muris': 883, 'HeterogeneityCD4TCells': 853, 'entorhinalCortexAlzheimers': 1272, 'SARS-CoV-2InfectionOralCavity': 1532, 'NucleiMultiplexHashing': 1055, 'PathogenInducedResidentMemory': 1151, 'IntegratedscRNA-SeqIdentifiesHumanPostnatalThymusS': 1182, 'HumanIpscDerivedAlveolarOrganoids': 769, 'pulmonaryFibrosisPathobiology': 1828, 'Faryabi-Human-10x3pv2': 1050, 'scHumanPancreaticIslets': 1178, 'PancreaticDuctalAdenocarcinomaFibroblasts': 1510, 'Cortex-MGE-fusion-organoids': 1170, 'Differentialcellcompositionandsplitepidermaldiffer': 1149, 'stemCellTrophoblast': 940, 'Haniffa-Human-10x3pv2': 853, 'LethalCovidLungAtlas': 1627, 'LiverFibrosisZonation': 1126, 'proliferatingRetinoblastomaSCRNAseq': 2640, 'schirmerIBMsnRNAseq': 924, 'TCellsHumanCentralNervousSystem': 331, 'atlasOfHumanTeeth': 938, 'HumanCerebralOrganoidsFetalNeocortex': 1266, 'chronicApicalPeriodontitis': 1203, 'CervicalCancerLandscape': 791, 'GastricAndMetaplasticMucosae': 930, 'PreCdcFateTxFactCompetition': 993, 'Single cell transcriptome analysis of human pancreas': 1721, 'Sathyamurthy2018MouseSpinal': 879, 'COMBAT2022': 1157, 'SchulteSchrepping2020': 1122, 'HumanColonRewiringUlcerativeColitis': 1076, 'AllelicExpressionPatterns': 906, 'HumanPhagocytesHealthyLupusDutetre': 954, 'OralSubmandibularHoreth10x': 1643, 'HematopoieticImmuneCellAtlas': 804, 'FetalSkinDevImmune': 1078, 'InfiltratingTCellsLupus': 1376, 'FetalEmbryonicBrain': 922, 'Landscape-ileum-colon': 1182, 'MultiomicCongenitalHeartDisease': 1292, 'DevelopmentalOriginsNeuroblastoma': 957, 'EpithelialDiversityHealthInflammation': 266, 'CD4TCellsInCrohnsDisease': 793, 'GSE132465_colorectal_cancer': 967, 'EndStageHeartFailureSimonson': 1020, 'HumanT2DPancreas': 1832, 'humanUrinaryCellsCovidAKI': 1901, 'DevelopmentalCellProgramsSkin': 437, 'CPmicroEnvironment': 2053, 'Tang-Human-IgAN-GEXSCOPE': 1503, 'LineageRecordingCerebralOrganoids': 679, 'HealthyAndCirrhoticLiver': 1067, 'Macosko-Human-10x3pv3': 1077, 'TissueStability': 638, 'atlasOfChromatinAccessibility': 1083, 'HumanDecidualLeukocytes': 1516, 'LineagestatesGastricCancer': 1712, 'ancestryInfluencesImmuneResponse': 1522, 'PulmonaryFibrosisGSE135893': 1178, 'HumanThymicDevelopment': 573, 'lungPopulationsIPF': 1497, 'CardiacFunctionalRecovery': 1138, 'lungCellularCensus': 1103, 'Tasic-Human-Smartseq': 1740, 'covid19AbseqPBMC': 1184, 'HumanNailMesenchymalEpithelial': 579, 'HumanHeartMaturationsnSeq': 1366, 'dcp5924379466506983': 260, 'RegenerativeLineagesLungMetastasis': 1153, 'Covid19PBMC': 2040, 'mtbAggregation': 928, 'Budinger-Human-10x': 1597, 'skinCellAnalysisPsoriasis': 1074, 'InVitroandInVivoDevelopmentoftheHumanAirwayatSingl': 909, 'NSCL_lesions_tumor_classification': 1058, 'NasaSpaceMiceSpleens': 1166, 'snRNAseqOfDrgAndSpinalCordGlialCells': 1343, 'ArutyunyanHumanPlacenta': 1629, 'CREDiseaseRetina': 1198, 'Healthy and type 2 diabetes pancreas': 1139, 'covid19PBMCsTCRandCITEseq': 1181, 'HeartSingleCellsAndNucleiSeq': 1275, 'SingleCellTranscriptomeCOPD': 1425, 'TemporalMapHumanMyelopoiesis': 673, 'KidneySingleCellAtlas': 768, 'normMyometriumAtlasAndLeiomyoma': 3506, 'MegakaryocyteDevelopment': 998, 'Drop-seq, DroNc-seq, Fluidigm C1 Comparison': 1583, 'transplantedHumanIsletsNuclei': 3634, 'DropletSequencingNaturalGeneticVariation': 958, 'dcp6600438190759903': 927, 'dandelionPBCM': 1009, 'prefrontal_cortex_dev': 1411, 'HumanBrainEndothelialGBM': 755, 'Alkaslasi2021': 1174, 'transplantedKidneyOrganoids': 1773, 'HumanMelanocyteDevelopment': 1192, 'SkeletalMuscleHumanMouse': 1872, 'NasaSpaceMicePbmc': 1196, 'CD90MarksMesenchymalProgram': 2088, 'Asingle-cellatlasofthehealthybreasttissuesrevealsc': 1087, 'Single-cellRNA-seqrevealscelltype-specificmolecula': 1036, 'Asinglecellatlasofthehumanlivertumormicroenvironme': 271, 'Blum2021MouseSpinal': 1125, 'Amulti-omicsatlasofthehumanretinaatsingle-cellreso': 960, 'breastTranscriptomeAtlas': 1181, 'humanTrophoblastCulture': 857, 'ZhangLabPdBrainNuclei': 1482, 'HumanBrainSubstantiaNigra': 489, 'HumanEndocrinePancreas': 338, 'TCellLymphomaHeterogeneity': 1827, 'humanAdultUreter': 1540, 'regnerGynecologicMalignancies': 1188, 'SC_analysis_kidney_organoids_fetal_kidney': 465, 'pbmcCov19Flu': 1207, 'Califano-Human-10x3pv2': 1202, 'AdultLungElo': 1097, 'ImmuneRenalCarcinoma': 1251, 'LiverCellAtlasHeterogeneity': 463, 'MacrophagesElicitAF': 953, 'TranscriptomeLandscapeHumanFollicle': 1059, 'HealthyhumankidneycelltypesinglecellRNA-seqdata': 339, 'RegulatoryMapPostnatalLung': 1110, 'BALDiscriminatingCOVID-19': 1669, 'HumanCardiacNiches': 1454, 'AgingSkinAtlas': 1196, 'GSE168405_OrganoidEndometrial': 1564, 'SkinPsoriasisGao10x': 1724, 'CD4+ cytotoxic T lymphocytes': 1766, 'PressureInducedHypertrophicHeart': 1108, 'immuneCellAtlasOfNeuroblastoma': 2160, 'IntraTumoralKidneyCancer': 1140, 'MalePrimordialGermCells': 1598, 'Mouse Melanoma': 715, 'TranscriptionalFeaturesHairFollicles': 1053, 'HumanPBMCfatalSepsis': 2097, 'Anorganoidandmulti-organdevelopmentalcellatlasreve': 1125, 'SingleCell,SingleNucleusandSpatialRNASequencingoft': 1554, 'HumanFetalKidney': 319, 'BloodAge10xArtyomov': 1202, 'pancreasCelSeq2': 1027, 'scRNAseqSystemicComparison': 970, 'HeartCellsGeneRegulation': 1095, 'Stress-inducedRNA–chromatininteractionspromoteendo': 1168, 'KidneyOrganoidsBioprinting': 397, 'CD34LineageMyocardialDu': 2546, 'DokeHumanPancreaticSlices': 1203, 'ChaffinHypertrophicCardiomyopathy': 1422, 'CerebellarDevLandscape': 1267, 'CryoPancreaticIsletCellPatchSeq': 1493, 'Burja_10x_SkinBiopsies': 370, 'ProximalEpididymisCFTR': 373, 'ColonImmune10XSS2VDJ': 791, 'Singlenucleusandspatialtranscriptomicprofilingofhu': 1408, 'WongAdultRetina': 1328, 'Single-nucleuschromatinaccessibilityandtranscripto': 1589, 'Tsirigos_10x_pdac': 379, 'SkinSystemicSclerosis': 1597, 'Grinberg-Human-10x3pv2': 1191, 'RenalTumorMicroenvironment': 1267, 'HumanMouseEntericNervousSystem': 277, 'eQTL-lung': 1093, 'DevelopingMouseKidneyHumanKidneyOrganoids': 717, 'DorsolateralCortexSpatial': 1126, 'HumanIsletType2Diabetes': 768, 'COVID-19AirwayEpitheliumImmune': 1273, 'Herrera-Human-10x3pv3': 1002, 'endometrioidAdenocarcinomaSeq': 1226, 'scHumanBrainVasculature': 1100, 'iPSCderivedTenocyte': 857, 'HumanPancreaticIslets': 321, 'HumanRPEandChoroid': 315, 'HumanGumsNormalandDiseased': 894, 'humanCorticalDevelopmentLandscape': 1703, 'KidneySexBasedTranscriptome': 1167, 'HewittRetinalOrganoids': 1152, 'ImmuneCellsSmallIntestineCeliac': 1485, 'CovidHypertension': 1246, 'MappingDevelopmentoftheHumanIntestinalNicheatSingl': 1034, 'MappingParotidGland': 1226, 'pediatricRhabdomyosarcomaAtlas': 1410, '1M Neurons': 667, 'distinctFunctionsOfPlasmablasts': 1045, 'GrNewald-Human-10x3pv3': 2793, 'humanIntestinalEpitheliumAtlas': 2151, 'Markedregionalglialheterogeneityinthehumanwhitemat': 1090, 'Keller-Human-10x3pv3': 1551, 'POPVaginalWallAtlas': 1334, 'Pitx2DevelopingHeart': 259, 'humanDentalPulpCells': 1203, 'MMRdandMMRpColorectalCancerAtlas': 1139, 'atlasOfHumanEndometrialStromalCells': 1066, 'SuryawanshiKidneyAllografts': 1796, 'BenchmarkingSingleCellProtocols': 1503, 'hippocampus_development': 491, 'HumanCorneaDevelopment': 257, 'LungCellAtlas2020': 1630, 'SkinFibroblastAutoimmuneXu10x': 1631, 'femoralHeadScRnaSeq': 1534, 'SkinLymphomaRindler10x': 1528, 'FibroblastProgenitorImmunomodulation': 1183, 'AcuteSkinInflammation': 364, 'Lafyatis_10x_ipf_lung': 1179, 'HumanNeonatalForeskin': 1109, 'breastCancerCellLinesAtlas': 1803, 'DevelopingSpinalCord': 1084, 'MyocardialInfarctionAmruteCITE': 1774, 'etiologyOfAutoimmuneRiskLoci': 1451, 'McCormack-Human-10x3pv2': 2255, 'healthyIPFLungMesenchymalCells': 553, 'cerebralCortexOrganoids22q11DS': 1162, 'CordBloodHematopoieticStemCells': 1319, 'PRJNA640427_human_neutrophils': 1471, 'RevisedAirwayEpithelialHierarchy': 1148, 'AtlasOfHumanIntervertebralDisc': 1000, 'LabialMinorSalivaryGlandCostaDaSilva10x': 1077, 'CovidImmuneAtlas': 1076, 'Xiao-Human-RNAscope': 1896, 'lymphaticEndothelialCells': 1013, 'DifferentiationofHumanIntestinalOrganoidswithEndog': 1041, 'ChronicInflammatorySkinClassification': 1856, 'InfluenzaVirusInfectionSingleCells': 972, 'AtlasOfTheHumanCorpusCavernosum': 555, 'Substantia_nigra_and_locus_coeruleus': 1526, 'LungTransplantationCOVID-19': 1781, 'cellSignaturesInAlzheimer': 1531, 'SingleCellMultiomeAtlasoftheHumanFetalRetina': 1014, 'SARS_COV_2_receptorsBronchial': 1082, 'altMolecularMechanismsInPD': 1677, 'Diabetic Nephropathy snRNA-seq': 1319, 'PediatricAstrocytomas': 1617, 'FemalePrimordialGermCells': 1589, 'humanIelAndLplTlymphocytes': 1055, 'HumanLiverImmuneCells_GSE125188': 523, 'osteoarthritisSynovialFibroblast': 2031, 'ProstateCellAtlas': 1036, 'HumanSpermatogonialStemCells': 1243, 'SurveyhumanBrainTranscriptomediversity': 401, 'HumanInnateLymphoidCells': 972, 'scAgingHumanMaleSkin': 1195, 'HumanHematopoieticProgenitors': 1056, 'HumanIleumCronsNormalHaberman': 1044, 'HtanPreCancerAltas': 1152, 'oralMucosaAtlas': 1780, 'scMultiomeOfTheHumanRetina': 1074, 'inducedRetinalPigmentEpitheliumCells': 1602, 'ChimerismKidneyTransplantReject': 1874, 'Mouse Endoderm Project': 1179, 'colorectalCancerAtlas': 433, 'SingleCellLiverLandscape': 1037, 'SingleCellPathologicalAngiogenesis': 1140, 'HumanFoveaRetinaScheetzSheffield': 1672, 'Singlecellatlasofthehumanretina': 2430, 'humanUrineCells': 1202, 'HumanTissueTcellActivation': 1004, 'PituitariesStemCellRegulation': 1059, 'Delile2019SpatialSpinal': 1297, 'OSCC-GBTwoCellularPrograms': 1600, 'Der-Human-LupusNephritis-Nextera-C1': 1348, 'DevelopingCerebralCortex': 346, 'SclerosisHumanLungCITESeq': 1916, 'AdultHemOrgans': 1042, 'SingleCellSequencingOfLungCarcinoma': 1378, 'HumanAdiposeTissue': 780, 'hepaticSpatialProteogenomics': 907, 'Strittmatter-Human-ATACseq': 1040, 'TCellsNeuroinflammation': 350, 'LymphoidInfidelityDermatitis': 1765, 'scRNACovidNasalSwab': 428, 'MolecularSignaturesInSftsPatients': 1524, 'HumanNaturalKillerDiversityYang': 637, 'pyleSkeletalMuscle': 1120, 'SingleCelleQTLCoexpressionAnalysis': 888, 'ReconstituitionHumanThymus': 1028, 'kriegsteinBrainOrganoids': 1717, 'Multimodalsinglecellsequencingofhumandiabetickidne': 2434, 'deciduaPregnancyLoss': 626, 'humanBloodCiteSeq2': 1164, 'subventricularHumanProgenitor': 947, 'HighlyParallelExpressionProfiling': 952, 'HumanPancreasNormalandDiseased': 2030, 'HumanPreimplantationEmbryosESCs': 352, 'scATACseqOfLymphNodeMetastasis': 400, 'Oligodendrocyte_MS': 1699, 'GSE111976-endometrium_MC': 1107, 'RiskVariantsAF': 1226, 'AtlasOfMajorAdultOrgans': 1860, 'chronicWoundDiabetes': 1546, 'ProfilingCisRegulatoryElements': 334, 'humanHippocampalDiversity': 2124, 'smokingAirwayEpithelium': 1587, 'AllergicInflammatoryMemory': 2219, 'CrossTissueReferenceMap': 908, 'humanFoveaAndPeripheralRetinaAtlas': 1350, 'HnsccImmuneLandscape': 1390, 'FetalLungImmune': 1526, 'lungFibrosisProteinSignatures': 1246, 'HumanAdultKidneyLiaoMo': 860, 'AdaptiveNKCellsInMultipleMyeloma': 1349, 'Hacohen-Human-CELseq2': 1058, 'COVID19autoimmunityPBMCs': 1608, 'IgANephropathySTRT': 1114, 'scAnalysisOfEndometriosis': 2071, 'humanEndometriumDynamics': 1070, 'Singlecelltranscriptionalandchromatinaccessibility': 1082, 'MultipleMyelomaCoevolution': 1448, 'gompertsLungSarsCov2': 681, 'Menon-Human-FSG-10x3': 1434, 'powellHumaniPSC': 1878, 'Lickert-Human-10x3pv2': 2269, 'biliaryTractCancerImmuneAtlas': 2168, 'pancreasNormalIslets': 1059, 'Pfister-Human-10x3pv2': 1508, 'Mould-Human-10x3pv3': 1864, 'gliaDiversityAtlasAcrossAdLifespan': 1187, 'HumanBCellsTonsils': 1037, 'HumanCorneaStemCells': 632, 'OleicAcidMultipleSclerosis': 590, 'HumanDevoLiverSegalRashid': 1081, 'molLandscapeOfUlcerativeColitis': 1212, 'humanMemoryLikeNKCellsCiteSeq': 697, 'skin-serine-proteases': 1403, 'Cellularheterogeneityofhumanfallopiantubesinnormal': 1076, 'HeartDiversityTucker10x': 2090, 'DopaminergicNeuronDifferentiation': 259, 'DevelopingCardiacSystem': 693, 'HepatoblastomaModeling': 1462, 'Single-cellmultiomicsofthehumanretinarevealshierar': 983, 'Single cell RNAseq characterization of cell types produced over time in an in vitro model of human inhibitory interneuron differentiation.': 1600, 'Aspatiallyresolvedsingle-cellgenomicatlasoftheadul': 1045, 'scRNASeqChildhoodLeukaemia': 1286, 'snRNAatlasOfSpinalCordNeurons': 1757, 'BALPreschoolCF': 1711, 'cellFateDynamics': 1690, 'MouseGastrulationAtlas': 599, 'HumanRetinaOrganoids': 1072, 'SkinAtopicDermatitis10x': 1909, 'HumanLymphoMyeloidProgenitorCells': 1111, 'HumanFirstTrimesterPlacentaDecidua': 786, 'Transcriptomicanalysisoftheocularposteriorsegmentc': 1651, 'CollagenProducingLungCell': 1192, 'humanGastricCancerCells': 708, 'scRNAofHelaCCL2': 460, 'GarciaOcana-Human-10x3pv3': 1986, 'GSE118184KidneyOrganoid': 940, 'Cellatlasofthehumanocularanteriorsegment:Tissue-sp': 1315, 'SARSCoV2ResponseDynamics': 1514, 'AtlasOfSubstantiaNigraInPD': 1203, 'Mehandru-Human-10x3pv2': 1362, 'SpatialRetinoblastomaWong': 1259, 'conservedCellTypesHumanMouse': 994, 'SARSCov2ChildrenAdults': 1999, 'humanHeartFailureCellularLandscape': 1178, 'ImmuneLandscapeccRCC': 1079, 'Reprogrammed_Dendritic_Cells': 685, 'Shalek-Human-SeqWellS3': 1227, 'WehrensHypertrophicCELSeq2': 1020, 'tabulaSapiens': 1184, 'ImmuneCellExhaustianHIV': 263, 'melanomaBrainMetastasisAtlas': 1165, 'Single-cellgenomicsimprovesthediscoveryofriskvaria': 1724, 'Nanopore_scSequencing': 1117, 'landscapesOfHumanOvarianAgeing': 1572, 'MassoniBadosaHumanTonsil': 1315, 'HumanBoneMarrowMyeloma': 700, 'CsfFromTwinsDiscordantForMultSclerosis': 848, 'CovidCellTypes': 1655, 'HumanHematopoieticProfiling': 951, 'Roussos-Human-10x3pv3': 1711, 'Covid19BALFLandscape': 5234, 'HumanGermlineCells': 1069, 'AtlasOfTheHumanRetina': 1077, 'JointProfilingChromatinAccessibilityGeneExpression': 307, 'BrainDepressiveDisorder': 915, 'atlasOfTheHumanCiliaryBody': 494, 'HumanHepatocyteDifferentiation': 517, 'AneurysmalHumanAorticTissue': 481, 'immuneCellsDuringSkinDevelopment': 1199, 'scRNAseqOfFailingHumanHeart': 411, 'HumanDCsFromPre-cDCs': 317, 'HumanDermalFibroblastSubpopulations': 1252, 'HumanFibroblastCharacterisation': 830, 'clonalArchitectureHumans': 886, 'fetalLiverAndCordBloodCiteSeq': 3725, 'humanPreimplantationEmbryos': 1081, 'Kidney biopsy scRNA-seq': 1810, 'SARS-CoV2ControlHostGenes': 1044, 'HPSI human cerebral organoids': 741, 'photoreceptorMacularSubregions': 231, 'InfiltratingNeoplasticCellsHumanGlioblastoma': 169, 'CrossTissueSARS-CoV-2Study': 243, 'HumanDevoKidneyAndOrganoids': 238, 'CorticalNephrogenicNicheKidney': 177, 'SingleCellsAlzheimers': 180, 'HCAseednetworkprecisetumor-nephrectomysamples': 105, 'MouseNervousSystem': 133, 'GSE171668_SingleCellAndSpatialAtlasCovid': 164, 'FreshAndFrozenTumours': 127, 'MultipleSclerosisLineageDiversity': 94, 'StemCellsInChronicMyeloidLeukemia': 108, 'nasalMucosaLifespan': 149, 'HumanCellLandscape': 165, 'Lung_Fibroblasts': 168, 'SingleCellsMultipleSclerosis': 93, 'CarTPatients': 106, 'GompertsAirwatCfCells': 87, 'LungEndothelialCells': 58, 'GSE119561_KidneyOrganoidsNephronLineage': 167, 'BM_PC': 206, 'MilkEpithelialCells': 236, 'Singlecelllandscapeofmesenchymalandendothelialcell': 179, 'HealthyAndDiabeticPancreas': 188, 'LiverAllograftDysfunction': 235 } ```

Here are the projectDescription lengths used in the plot above, as a sorted list:

[58, 87, 93, 94, 105, 106, 108, 127, 133, 149, 164, 165, 167, 168, 169, 177, 179, 180, 188, 206, 231, 235,
 236, 238, 243, 257, 259, 259, 260, 263, 266, 271, 277, 307, 315, 317, 319, 321, 331, 334, 338, 339,
 346, 350, 352, 364, 370, 373, 379,  397, 400, 401, 411, 427, 428, 433, 437, 460, 463, 465, 481, 489,
 491, 494, 517, 523, 548, 551, 553, 555, 573, 579, 588, 590, 599, 626, 632, 632, 637, 638, 641, 650,
 667, 673, 679, 681, 685, 693, 697, 700, 708, 715, 717, 720, 721, 741, 755, 768, 768, 769, 780, 786, 791,
 791, 793, 804, 830, 848, 853, 853, 857, 857, 860, 879, 881, 883, 886, 888, 894, 906, 907, 908, 909,
 915, 922, 924, 927, 928, 930, 938, 940, 940, 947, 951, 952, 953, 954, 957, 958, 960, 967, 970, 972,
 972, 978, 983, 993, 993, 994, 998, 1000, 1000, 1002, 1004, 1004, 1009, 1013, 1014, 1020, 1020, 1027,
 1028, 1034, 1036, 1036, 1036, 1037, 1037, 1040, 1041, 1042, 1044, 1044, 1045, 1045, 1050, 1053, 1055,
 1055, 1056, 1058, 1058, 1059, 1059, 1059, 1060, 1066, 1067, 1069, 1070, 1072, 1072, 1074, 1074, 1075,
 1076, 1076, 1076, 1077, 1077, 1077, 1078, 1078, 1079, 1081, 1081, 1082, 1082, 1083, 1084, 1086, 1087,
 1088, 1090, 1093, 1095, 1097, 1100, 1103, 1107, 1108, 1109, 1110, 1111, 1114, 1116, 1117, 1119, 1120, 1122,
 1125, 1125, 1126, 1126, 1138, 1139, 1139, 1140, 1140, 1143, 1144, 1145, 1148, 1149, 1151, 1151, 1152, 1152,
 1152, 1153, 1157, 1162, 1163, 1164, 1165, 1166, 1167, 1168, 1170, 1174, 1178, 1178, 1178, 1179, 1179, 1181,
 1181, 1182, 1182, 1183, 1184, 1184, 1187, 1188, 1191, 1192, 1192, 1195, 1196, 1196, 1198,1199, 1202, 1202,
 1202, 1203, 1203, 1203, 1203, 1207, 1207, 1212, 1220, 1226, 1226, 1226, 1227, 1233, 1243, 1246, 1246,
 1251, 1252, 1259, 1263, 1266, 1267, 1267, 1272, 1273, 1275, 1286, 1292, 1297, 1297, 1315, 1315, 1319,
 1319, 1328, 1334, 1343, 1347, 1348, 1349, 1350, 1362, 1366, 1376, 1378, 1390, 1391, 1403, 1408, 1410,
 1411, 1422, 1425, 1434, 1441, 1445, 1448, 1451, 1454, 1462, 1471, 1482, 1485, 1493, 1497, 1503, 1503,
 1508, 1510, 1511, 1514, 1516, 1522, 1524, 1526, 1526, 1528, 1531, 1532, 1534, 1540, 1546, 1550, 1551,
 1554, 1564, 1572, 1583, 1585, 1587, 1589, 1589, 1597, 1597, 1598, 1600, 1600, 1602, 1608, 1617, 1627,
 1629, 1630, 1631, 1643, 1651, 1655, 1669, 1672, 1677, 1683, 1690, 1699, 1703, 1711, 1711, 1712, 1717, 1721,
 1724, 1724, 1734, 1740, 1757, 1765, 1766, 1773, 1774, 1777, 1780, 1781, 1796, 1803, 1810, 1827, 1828, 1832,
 1846, 1856, 1860, 1864, 1872, 1874, 1878, 1896, 1901, 1909, 1916, 1942, 1981, 1986, 1999, 2030, 2031,
 2040, 2053, 2071, 2088, 2090, 2097, 2124, 2151, 2160, 2168, 2219, 2236, 2255, 2269, 2430, 2434, 2546,
 2640, 2793, 3506, 3634, 3725, 5234]
achave11-ucsc commented 2 months ago

@hannes-ucsc: "We will set ignore_above to 10000 for the project description field and add an assertion that would fail when indexing a project with a larger description."