LungCellAtlas / HLCA

MIT License
48 stars 5 forks source link

Condition column in the meta.data #3

Closed pallavisurana1 closed 1 year ago

pallavisurana1 commented 1 year ago

Hi , I am a little unclear as to what 'nan' means in the Condition column of meta.data in the integrated HLCA dataset. Cells set to "nan" come from donors without lung condition. Does this mean they are not healthy or healthy?

If I want only the healthy cells I could ignore the "nan" or unknowns right?

LisaSikkema commented 1 year ago

Hi @PallaviSurana1,

Are you looking at the HLCA core or the extended HLCA? This is a column we forgot to clean up but that will be corrected in the updated/cleaned up HLCA which should be online soon!

pallavisurana1 commented 1 year ago

Hi @LisaSikkema

I am looking at the extended HLCA. So should I consider only healthy if I want healthy lung tissues and remove "nan" right?

LisaSikkema commented 1 year ago

To be sure, yes. I think they might all be healthy but not sure.

LisaSikkema commented 1 year ago

If you tell me which datasets they come from I can tell you if they're healthy or not :)

pallavisurana1 commented 1 year ago

This is for the condition = "nan" I am interested in

study Number of cells
Banovich_Kropski_2020 96516
Barbry_Leroy_2020 23988
Jain_Misharin_2021 0
Krasnow_2020 0
Lafyatis_Rojas_2019 10708
Meyer_2019 21602
Misharin_2021 27414
Misharin_Budinger_2018 26912
Nawijn_2021 0
Seibold_2020 28364
Teichmann_Meyer_2019 11667
LisaSikkema commented 1 year ago

So in Supplementary Table 1 from the bioRxiv you can see which datasets contained which kind of lung conditions https://www.biorxiv.org/content/10.1101/2022.03.10.483747v1.supplementary-material

Banovich_Kropski and Misharin_Budinger contain both healthy and disease (the disease samples might be correctly annotated, you can check). The others are healthy (in the case of Krasnow from tumor-adjacent "healthy" tissue)

pallavisurana1 commented 1 year ago

Thanks. I will remove those studies and analyze them.