czbiohub-sf / tabula-muris-senis

Tabula Muris Senis
http://tabula-muris-senis.ds.czbiohub.org
BSD 3-Clause "New" or "Revised" License
96 stars 27 forks source link

How do I extract the list of genes names in each tissue? #39

Open orrl16 opened 1 year ago

orrl16 commented 1 year ago

I am working with the h5ad.

Additional question, what is the different in the pre-process phases of the raw.X and the X datasets?

aopisco commented 1 year ago

The gene names are in adata.var raw.X is normalized, .X is normalized and scaled

orrl16 commented 1 year ago

Thanks a lot! Where can I download the 'adata.var' file?

aopisco commented 1 year ago

it's part of the file, like you access .X or .raw.X you also have .var

orrl16 commented 1 year ago

Thanks a lot! I tried to access it but couldn’t. I downloaded all tissues h5ad files.

That is what I got when displaying the structure of the HDF5 bat_facs.h5ad /var I did not find “adata”

Dataset 'var' Size: 22899 MaxSize: 22899 Datatype: H5T_COMPOUND Member 'index': H5T_STRING String Length: 17 Padding: H5T_STR_NULLPAD Character Set: H5T_CSET_ASCII Character Type: H5T_C_S1 Member 'n_cells': H5T_STD_I64LE (int64) Member 'means': H5T_IEEE_F32LE (single) Member 'dispersions': H5T_IEEE_F32LE (single) Member 'dispersions_norm': H5T_IEEE_F32LE (single) Member 'highly_variable': H5T_ENUM Base Type: H5T_STD_I8LE Member 'FALSE': 0 Member 'TRUE': 1 ChunkSize: [] Filters: none FillValue: H5T_COMPOUND

If you could send me the list of genes names used for all tissues in both FACS and DROPLET samples it would be great!

my email is @.***

On 30 Nov 2022, at 17:26, aopisco @.***> wrote:

it's part of the file, like you access .X or .raw.X you also have .var

— Reply to this email directly, view it on GitHub https://github.com/czbiohub/tabula-muris-senis/issues/39#issuecomment-1332816160, or unsubscribe https://github.com/notifications/unsubscribe-auth/A4QAS5SPZV2UBAXUJWK3XIDWK7IBHANCNFSM6AAAAAASQCFEJ4. You are receiving this because you authored the thread.

orrl16 commented 1 year ago

my email is orr.levy et Yale.edu

orrl16 commented 1 year ago

it's part of the file, like you access .X or .raw.X you also have .var

I have tried to look at .var in these files: but there were no information about the gene list...

https://figshare.com/articles/dataset/Processed_files_to_use_with_scanpy_/8273102/2

aopisco commented 1 year ago

@orrl16 the h5ad objects follow the anndata (adata in short) structure: https://anndata.readthedocs.io/en/latest/index.html

orrl16 commented 1 year ago

Thanks again! I looked at 'var' and found a list of genes names in the length of 22899. Dataset 'var' Size: 22899 MaxSize: 22899 I can easily extract the list of genes names from that structure. However, The relations between X (the gene expression table) and the index list is still not clear to me. In both .X or .raw.X there are 33538 genes. How do I match the 22899 indexes to 33538 genes in the gene expression table? Best regards and thanks again!