dbrg77 / scATAC_snakemake

Snakemake pipeline for plate scATAC-seq processing
26 stars 8 forks source link

No values for nCount_peaks and nFeature_peaks in SeuratObject after CreateChromatinAssay #3

Closed FrancoFelix closed 1 year ago

FrancoFelix commented 1 year ago

Hi Xi,

Thank you for putting together this protocol for plate based scATAC and the resources to process the raw data!

I have been able to process the scATAC-Seq data through the snakemake script provided. However, I have an issue when I create a SeuratObject from the snakemake output. I can create the SeuratObject so longs as I do not filter in the CreateChromatinAssay step with min.cells or min.features. If I do not use these filters I can create the object and continue the analysis to creating my UMAP which shows no clustering.

I suspect that this is due to my SeuratObject only containing two cells that have nonzero values for nCount_peaks and nFeature_peaks. This is odd because all of the other QC looks good.

Any ideas on how to fix this?

Thanks!

`library(Signac) library(Seurat) library(GenomeInfoDb) library(EnsDb.Hsapiens.v75) library(ggplot2) library(patchwork) library(hdf5r) library(dplyr) library(readr) library(biovizBase) library(EnsDb.Mmusculus.v79)

read the content from the 'outs' directory

setwd("~/Experiment/outs") mex_dir_path <- "~/Experiment/outs"

mtx_path <- paste(mex_dir_path, "count_matrix_over_aggregate.mtx", sep = '/') feature_path <- paste(mex_dir_path, "count_matrix_over_aggregate.rows", sep = '/') barcode_path <- paste(mex_dir_path, "count_matrix_over_aggregate.cols", sep = '/')

features <- readr::read_tsv(feature_path, col_names = F) %>% tidyr::unite(feature) barcodes <- readr::read_tsv(barcode_path, col_names = F) %>% tidyr::unite(barcode) metadata <- read.csv( file = "~/Experiment/outs/sample_info.csv", header = TRUE, row.names = 1 )

create a Signac chromatin assay and a Seurat object

mtx <- Matrix::readMM(mtx_path) %>% magrittr::set_rownames(features$feature) %>% magrittr::set_colnames(barcodes$barcode)

features <-features[!grepl("random", features$feature),] features <- features[!grepl("chrUn", features$feature),] mtx <- mtx[rownames(mtx) %in% features$feature,]

chromassay <- CreateChromatinAssay( counts = mtx, sep = c("", "_"), genome = 'mm10', fragments= "~/aggregate_fragments.tsv.gz" )

atac <- CreateSeuratObject( counts = chrom_assay, assay = 'peaks', project = 'scATAC', meta.data = metadata )

atac atac[['peaks']] granges(atac)

extract gene annotations from EnsDb

annotations <- GetGRangesFromEnsDb(ensdb = EnsDb.Mmusculus.v79)

change to UCSC style since the data was mapped to hg19

seqlevelsStyle(annotations) <- 'UCSC'

add the gene information to the object

Annotation(atac) <- annotations atac

compute nucleosome signal score per cell

atac <- NucleosomeSignal(object = atac)

compute TSS enrichment score per cell

atac <- TSSEnrichment(object = atac, fast = FALSE)

atac$blacklist_fraction <- FractionCountsInRegion( object = atac, assay = 'peaks', regions = blacklist_mm10 )

atac$high.tss <- ifelse(atac$TSS.enrichment > 2, 'High', 'Low') TSSPlot(atac, group.by = 'high.tss') + NoLegend()

atac$nucleosome_group <- ifelse(atac$nucleosome_signal > 4, 'NS > 4', 'NS < 4') table(atac$nucleosome_group) FragmentHistogram(object = atac, group.by = 'nucleosome_group', region = "chr1-1-500000000")

VlnPlot( object = atac, features = c('frip', 'library_size', 'TSS.enrichment', 'blacklist_fraction', 'nucleosome_signal'), pt.size = 0.1, ncol = 5 )

Screenshot 2023-02-23 at 2 36 02 PM

Screenshot 2023-02-23 at 2 37 01 PM

Screenshot 2023-02-23 at 2 37 27 PM

atac <- RunTFIDF(atac) atac <- FindTopFeatures(atac, min.cutoff = 'q0') atac <- RunSVD(atac)

DepthCor(atac)

Non-linear dimension reduction and clustering atac <- RunUMAP(object = atac, reduction = 'lsi', dims = 2:10) atac <- FindNeighbors(object = atac, reduction = 'lsi', dims = 2:) atac <- FindClusters(object = atac, verbose = FALSE, algorithm = 3) DimPlot(object = atac, label = TRUE) + NoLegend()

Screenshot 2023-02-23 at 2 50 32 PM

Screenshot 2023-02-23 at 2 50 44 PM `

Edit*** Here is an image of the the SeuratObject "ATAC" in the environment where we see that ATAC@meta.data$nCount/nFeature_peaks only have nonzero values for the first two cells, all else are zero.

Screenshot 2023-02-23 at 3 47 07 PM

dbrg77 commented 1 year ago

Hi @FrancoFelix

Thanks for your post. Could you provide a few lines from each of these files?

outs/count_matrix_over_aggregate.rows
outs/count_matrix_over_aggregate.cols
outs/aggregate_fragments.tsv.gz
outs/sample_info.csv

I'm currently pre-occupied with grant and teaching related duties and will come back to this issue in two weeks.

Sorry about this.

Thanks.

FrancoFelix commented 1 year ago

Hi @dbrg77,

No worries, best of luck on your grant and thank you for getting back to me.

Below are the first 15 lines of each of the files. The outs/aggregate_fragments.tsv.gz seems to be unable to be read in any viewer that I use.

`(base) ffelix96@DN52eg1i ~ % head -n 15 //outs/sample_info.csv cell,dup_level,sequencing_depth,uniq_nuc_frags,mt_content,frip,mapping_rate,frac_open,library_size plate1_Parab_IsoY100_CKDL220022691-1A_HJ2YCBBXX,0.9270309999999999,1098516,71143,0.839083,47.9957,94.19,21.3602,71745.0 plate1_Parab_IsoY101_CKDL220022691-1A_HJ2YCBBXX,0.9453219999999999,1132471,54031,2.881332,50.3847,94.57,21.2076,55634.0 plate1_Parab_IsoY102_CKDL220022691-1A_HJ2YCBBXX,0.949938,1697031,76776,0.39310300000000004,36.4464,95.54,18.6614,77079.0 plate1_Parab_IsoY103_CKDL220022691-1A_HJ2YCBBXX,0.9508190000000001,2456138,109591,0.6734100000000001,34.7286,95.55,22.8138,110334.0 plate1_Parab_IsoY104_CKDL220022691-1A_HJ2YCBBXX,0.8950319999999999,373836,34649,0.201619,9.4588,94.05,3.8606,34721.0 plate1_Parab_IsoY105_CKDL220022691-1A_HJ2YCBBXX,0.919241,887000,66185,0.579832,44.9918,96.26,19.6104,66571.0 plate1_Parab_IsoY106_CKDL220022691-1A_HJ2YCBBXX,0.912159,699187,56399,0.372726,45.6342,96.06,19.538,56610.0 plate1_Parab_IsoY107_CKDL220022691-1A_HJ2YCBBXX,0.9430440000000001,584757,28640,0.34448,11.4131,93.2,3.5810000000000004,28739.0 plate1_Parab_IsoY108_CKDL220022691-1A_HJ2YCBBXX,0.920975,613501,42091,1.571452,53.3896,92.92,17.9474,42763.0 plate1_Parab_IsoY109_CKDL220022691-1A_HJ2YCBBXX,0.923273,315222,20362,3.916572,33.9798,94.0,6.6207,21192.0 plate1_Parab_IsoY10_CKDL220022691-1A_HJ2YCBBXX,0.794309,390496,72691,0.901134,45.4746,95.19,23.6347,73982.0 plate1_Parab_IsoY110_CKDL220022691-1A_HJ2YCBBXX,0.779933,191627,38561,0.469762,45.6779,95.78,13.1111,39205.0 plate1_Parab_IsoY111_CKDL220022691- 1A_HJ2YCBBXX,0.9447040000000001,1441366,72615,1.002045,46.9945,95.79,21.1564,73350.0 plate1_Parab_IsoY112_CKDL220022691-1A_HJ2YCBBXX,0.8539110000000001,147434,18441,2.660333,30.644000000000002,94.32,5.7207,18967.0

(base) ffelix96@DN52eg1i ~ % head -n 15 /outs/count_matrix_over_aggregate.cols plate3_Parab_HetY247_CKDL220022691-1A_HJ2YCBBXX plate3_Parab_HetY201_CKDL220022691-1A_HJ2YCBBXX plate3_Parab_HetY348_CKDL220022691-1A_HJ2YCBBXX plate3_Parab_HetY2_CKDL220022691-1A_HJ2YCBBXX plate3_Parab_HetY339_CKDL220022691-1A_HJ2YCBBXX plate3_Parab_HetY383_CKDL220022691-1A_HJ2YCBBXX plate3_Parab_HetY270_CKDL220022691-1A_HJ2YCBBXX plate3_Parab_HetY236_CKDL220022691-1A_HJ2YCBBXX plate3_Parab_HetY357_CKDL220022691-1A_HJ2YCBBXX plate3_Parab_HetY311_CKDL220022691-1A_HJ2YCBBXX plate3_Parab_HetY258_CKDL220022691-1A_HJ2YCBBXX plate3_Parab_HetY229_CKDL220022691-1A_HJ2YCBBXX plate3_Parab_HetY293_CKDL220022691-1A_HJ2YCBBXX plate3_Parab_HetY360_CKDL220022691-1A_HJ2YCBBXX plate3_Parab_HetY326_CKDL220022691-1A_HJ2YCBBXX

(base) ffelix96@DN52eg1i ~ % head -n 15 /outs/count_matrix_over_aggregate.rows chr1 3043112 3043530 chr1 3671512 3671967 chr1 4412335 4412800 chr1 4571569 4572164 chr1 4769938 4770210 chr1 4775188 4775792 chr1 4785220 4786128 chr1 4807446 4808288 chr1 4857356 4858451 chr1 4874491 4874743 chr1 5019158 5019741 chr1 5022669 5023177 chr1 5070299 5070507 chr1 5076221 5076521 chr1 5082756 5083640

(base) ffelix96@DN52eg1i ~ % head -n 15 /outs/aggregatefragments.tsv.gz �BCa�9�t�qFc�����P�o�C;� �!���ź��tc��u��1��^.��X���5��Ǒ��������?���o����������+��?����s�����������?���������7�E������2&���ۙ�WJ��: ��vRJ>)�����ig.1���IG���REor{ �>ߛ&0�{0q�iϏ����f~ f�el��w��{*¬��[��P�^��󞱙�{0|l����5�0����=g��5�-+��BJ����>��Zw12i��w��=�<?�r�߂)���k =^ � �Ȯ#�n��뒸���(� ����QK��7���_c2����[0t¯1����阙�MC�7����mOA��υ����L]cr}Ko�:k��=�T�O,x[��=�P�$�PE�S���󦏚�Z)�]u�!��7yUoGV�Ox;�v��d1[��N�v��f������V�$�N1=��Lj�;���e�B�B���u�Z��o�C\�q8f-�s�CSW�ni�[��3��N����6��ꡆ���"�;�<�48�N�5�$��[��~T�jR�3��Z�v.�K�-��dV@���k�'$/� �����pv޿n��ϋ���c)�w���a���=swI3?��� �cs��+8�@u-M�r�m=� � -�S�a�L��v$��(툪5k�-$�Qe�&���?���n��iF��S%��$!ˡ��ڛ�� ���MO`���a�� �~�I���>�. Q�ljYsj�j�E��vB�)���5f��ʀev��V��~���l{/o;|�J�[�x[�|W8Qپ�������Z�a�a\ȹ4an@�dž$��֥� �D��p��Q�h��f0([NT5��Z�N�0��Mi��oL���^�[����K����ҙ�Ę�G1�=��R�� }����N�R�7�}�� ����;A�~���C��)C����c3�v4����O\c��~��al"��@�Ɣ��0�PLy���&&fs��>߶��l�t����r'�͂#�zޟdFCJBH$x���̄G�a� ]-�5O�3X�84����P��Ƥtp��aTo�8/=y��R�I�җk�6򸄎S߃�)�f�es��;.0�q��eԮ%�7�z�W�J��-�J-cx.�������玚M��4}(�P``)�L�l�1��._1���3c�OLVΟ��O��(��a?w��̕P����YU��PfUSӊ�u�Tx����l��RVNF#6K���]�G� S��;�L�߯<��Rs?��4���c�(���1�M���V5Ot���F��f�y�Qadbc��t?:Z�G�r����v[|U-TH�9��� �1�u�ӳ^�>7��R���<�m¥��O;��Л�=E�f~ J��=5G�X�0K����%1ufխ2��J�b�������B�wi�۞Y���ƜCܕJ�&\~���-t�?0R�OT�)�v��)3��6��\7嗿�яO��������8��ä"�C/��a�L��[�#3Ũv���3�����L��"�I�TL\R���Byt?=�S9౰��(=f��a��+��èc�����-"����Cl(��{GI��|�(k��R`]-� �y�7�����N�y6�J��T���o�ဧGX:�_���D=���0G�S�����x�m��(����0J�>-�%��\��f̣����(R�����WR/�1ʊZ��M�6�P:C-�k ��}��CȈ�%MW���7�1&f�a�[�p���Y�d�l����P���2��ȚA�j��Њ"w������1�����C�x�gp�0M�L�#�Mp:�� ������\�O�{�ykJ�ڙ0�}5�썴��|��o^��~�k �Ga�������# ̚ 1/�Ö½Y��P�w�������du�Q�D��� K��0��0˽����Mc���� �x�1�ƾL}�X�/��<��T��"�N��jl�Y��iߔ���5��o�k �=Q�Y}�}=��or�N�AB�ڙ��桼��w��C�=8�P�|��>/ֿd ���C��!�lp�����L���5��e���35�RE�T��c3�$�:�¼49�����:ۯ��M������1�>~�>���![k�KIP�~�L'>�~��%L�F�b��M3�T�bCl��ނY'�d�Qp 0����6~�۴q�|bTM&z����� ��aof��6{�yp��)�V����m%7��Z�����ʞ��֔{cj:1m���U �Ka ��f�DX��~�o[�dž�� �q�� �0t���|u$E�3�����;�ЊJ&�<���_� �h�[/�ٮ�t$<�_0���KL�����5��v�.�k ���߂)o����5f zN]c��#ļg�,H��gvx�X�X���hhu�w?�d1���wI�ӕ)�R�cxpO��ixY%�@G��<�/�3g�wO�� �%.�T�@��R~���w%�ݖ$�<A�D���U�W���i�]q�"�$�*�=O�f�K�2р~�5z�߈Yo����s���U�-�)�)�FQUW���om��~�-����^k~z#�po.1�6�Q��U=J���iӷ���n�����RS����z/M�bP37�yp�zțV�N��e��Y���w�yOo��qnʙ�6Xs��yj#���z��3��2��$ q��G�'�㕚Wň��m���I)��j����F:��fS ��P�=E1�o�,s��9�Q�ތ�������~�s��Q��w$G�C���Η�2�K���� "RE�~��L��zy�Ѻ�mN�� �w�����)�c�W��[�hoz�,�a�����$�0�fx;�ezJ��1 0�M�� �ra� 1��a`d!��r�!J-��

                 ݚ�LT0�Lצ�f$�����0�<��6cV����`hE�L�6�E��(����'fH��cΏZ��낍ٽ��X���̞�!�ʻ�&[5�H�16M��aw���غ      0Ա�\���QO`<{�!Ƙ����&]��_��o��,���<z��ѻ1醑%����J��)I*I0!~��ue����w��yf$������c�+��4Y��q���eJ!

}Q߂��{0���6�^1���}����⮱Y���ES˹�[�A^J��^3�6�^��Oʛ��xeg��N�ګr�Й�#{oZ*⬃E���}?��t�(ym_�i8�f9��6ݩ��ʕ�p��(�l/���w��ʷ���w|��p�C�w@�۠�S�$��0�t��b��SV���ޘ�IO4SC �ޅ>ٍ�`��i�a��kp���A6è��{a����M޸�#��io���8�ђ�``�D�������u\�ha��+��M��AoBc�׉�2��Lٗ�� ��>����$;ה9ڰ&�\祺u|��||&Q�q>}�_ �e��7E(� ;�[گ�S�R��f�|W*2\idɇ!$�y%>���7z�" T�"

               �G�6��a`�D��U+�sK%������b�Z b        ��^�ꘃ�aøv�l��Ҵ�r=�y�EV�ekG�C�؆�s���     _M_v��o��:���־[��Rؘ~��y�6&����Ǔ/0�Z�$��6� 0P�%0|�]c�������!�-3ť�5�oޘzè����y����RWkK�
                                                              T�0�e??��Ur*>`�1�/�)������M���M���/���!n�k
                                                                                                        uDEx�%00��0�66?y�x����M=����Q�I��9t���

�c�!���1�d~����L�-X3�"Z4��;�d���b�����هJNP-t��7�NeVQqX�8f����V��붯�Z;Tz?�7a�w�1��Q6Z�����f{����D���NV�V_ a�a:�~$~�]��,�x٘�U��la����u��7Sz9�L:���n�0P��f��ڛ���Q�.���3 !�e�E�I�Q�|�t���z���l��'NG�G�v�5 =��g�gr�.$(�Yx��T�� ��iF�1���2 ��"�6�����\�� ;?�ۥ|���]y�RlmQ���Y�}�y�w���� 38Qvf��m�uF���|��" +oa���~�]a�=���awK����6C��vD��'7�Y��>�}���C�W+�B+'��Z6La 00�>�0�#�0?R�a�O�^���w�Ly���U��J��/dn�T9�,_���6Rw;�-r�Ng?OمVi喥{�_�2�H� ^|���b�FaX�8İr4^Ћ1p�뱫�Z;�ژt�����~}�{��� ��X �~��S��㬏� � _1E�(�y=wk�,|�� �kL�������%^�J��U��aX�X���6o�*/�B>N'|>�Po[½�M����&�ܙ/��5L)�z��N�<��F�67�)�s�F34 mqͳ]�xx@��ݷ�Kk� \��ҷ��}�-�jS�1�N|�a�]#LaeRn�{�s������w泮�cg�7�c����q����S�kK!��Ƒ6f^���<5мU

&ah44ϕ|��"oIQ9��:1S�5'*ѯ1����PU�~��.�,K�2Q��^��\It�� �B��b�c�.k�& �K

   �x�~>�6ʧ��=��*��6���ɻ�nw��/�ڵ_/���"�Tٙ�ޖ$��a���0�Ia`H?�0�a\���ODv����!,u2���w@a��B�[���Mi�v����M_�.�O�*L���5t�Y/�ٛ��B��0C�z�"�; 0TP`fB@yKW^^���'ݸXt�B�B��◣#���r��ZU�ό�v���X��ư����ᮧ�_�Z4=Pj^�h�G��+If�V��ڧ�
        ���M�����
                  1T=k^W/m��Z�D�a�
                                  ����if���O`�#��

Ui>0�]�����[�G%B�쳟�Ъ��hh�6o�T٠77���Z��NS" Ę���v��NƤS�\Uf ��Y�0�(�ɍ�Ĩ��`A��sW����*�

FrancoFelix commented 1 year ago

Hi @dbrg77,

I hope your classes and grant writing went well!

Any chance you've been able to look further into this issue?

Thank you,

dbrg77 commented 1 year ago

Hi @FrancoFelix,

Sorry for the late reply. I'm not able to reproduce the result using our own data.

Indeed, all QC looks okay, so not sure why it gives this behaviour when loading into R. I suspect there might be some formatting issues with some files or cell names etc.

Could you please share the following files so that we could have a look for you?

outs/count_matrix_over_aggregate.rows
outs/count_matrix_over_aggregate.cols
outs/count_matrix_over_aggregate.mtx
outs/aggregate_fragments.tsv.gz
outs/sample_info.csv

Thank you!

Xi

FrancoFelix commented 1 year ago

Hi @dbrg77,

Great, thank you!

I have emailed you the files. They are to large to attach here.

Best,

dbrg77 commented 1 year ago

Hi @FrancoFelix

Thanks for sharing the file and code. My student @helianfeixing has tried your code using our data, and the code works fine, so the problem is not associated with reading the data into R.

I noticed that your mtx file look strange. Are you using the latest version of the pipeline? In our current setting, the mtx file should have three headers, like this:

%%MatrixMarket matrix coordinate integer general
%
89779 1536 14042796

However, your mtx file currently looks like this (only two lines of header):

%%MatrixMarket matrix coordinate integer general
89779 1536 1536

If manually change the header like described above, the problem is solved:

Screenshot 2023-03-27 at 15 30 29

Maybe you should try the latest version of the pipeline by clone it again.

In future, @helianfeixing will answer questions regarding the pipeline.

I hope this helps.

Regards, Xi

FrancoFelix commented 1 year ago

Hi @dbrg77 and @helianfeixing,

Thank you so much for your help, I greatly appreciate it.

I had trouble installing the macs2 so I used version 2.2.6, it is possible the problem stemmed from there.

Thank you again for the help and for developing the pipeline.

Best,

dbrg77 commented 1 year ago

Great! No problem at all.

Closing now.