Closed FrancoFelix closed 1 year ago
Hi @FrancoFelix
Thanks for your post. Could you provide a few lines from each of these files?
outs/count_matrix_over_aggregate.rows
outs/count_matrix_over_aggregate.cols
outs/aggregate_fragments.tsv.gz
outs/sample_info.csv
I'm currently pre-occupied with grant and teaching related duties and will come back to this issue in two weeks.
Sorry about this.
Thanks.
Hi @dbrg77,
No worries, best of luck on your grant and thank you for getting back to me.
Below are the first 15 lines of each of the files. The outs/aggregate_fragments.tsv.gz seems to be unable to be read in any viewer that I use.
`(base) ffelix96@DN52eg1i ~ % head -n 15 //outs/sample_info.csv cell,dup_level,sequencing_depth,uniq_nuc_frags,mt_content,frip,mapping_rate,frac_open,library_size plate1_Parab_IsoY100_CKDL220022691-1A_HJ2YCBBXX,0.9270309999999999,1098516,71143,0.839083,47.9957,94.19,21.3602,71745.0 plate1_Parab_IsoY101_CKDL220022691-1A_HJ2YCBBXX,0.9453219999999999,1132471,54031,2.881332,50.3847,94.57,21.2076,55634.0 plate1_Parab_IsoY102_CKDL220022691-1A_HJ2YCBBXX,0.949938,1697031,76776,0.39310300000000004,36.4464,95.54,18.6614,77079.0 plate1_Parab_IsoY103_CKDL220022691-1A_HJ2YCBBXX,0.9508190000000001,2456138,109591,0.6734100000000001,34.7286,95.55,22.8138,110334.0 plate1_Parab_IsoY104_CKDL220022691-1A_HJ2YCBBXX,0.8950319999999999,373836,34649,0.201619,9.4588,94.05,3.8606,34721.0 plate1_Parab_IsoY105_CKDL220022691-1A_HJ2YCBBXX,0.919241,887000,66185,0.579832,44.9918,96.26,19.6104,66571.0 plate1_Parab_IsoY106_CKDL220022691-1A_HJ2YCBBXX,0.912159,699187,56399,0.372726,45.6342,96.06,19.538,56610.0 plate1_Parab_IsoY107_CKDL220022691-1A_HJ2YCBBXX,0.9430440000000001,584757,28640,0.34448,11.4131,93.2,3.5810000000000004,28739.0 plate1_Parab_IsoY108_CKDL220022691-1A_HJ2YCBBXX,0.920975,613501,42091,1.571452,53.3896,92.92,17.9474,42763.0 plate1_Parab_IsoY109_CKDL220022691-1A_HJ2YCBBXX,0.923273,315222,20362,3.916572,33.9798,94.0,6.6207,21192.0 plate1_Parab_IsoY10_CKDL220022691-1A_HJ2YCBBXX,0.794309,390496,72691,0.901134,45.4746,95.19,23.6347,73982.0 plate1_Parab_IsoY110_CKDL220022691-1A_HJ2YCBBXX,0.779933,191627,38561,0.469762,45.6779,95.78,13.1111,39205.0 plate1_Parab_IsoY111_CKDL220022691- 1A_HJ2YCBBXX,0.9447040000000001,1441366,72615,1.002045,46.9945,95.79,21.1564,73350.0 plate1_Parab_IsoY112_CKDL220022691-1A_HJ2YCBBXX,0.8539110000000001,147434,18441,2.660333,30.644000000000002,94.32,5.7207,18967.0
(base) ffelix96@DN52eg1i ~ % head -n 15 /outs/count_matrix_over_aggregate.cols plate3_Parab_HetY247_CKDL220022691-1A_HJ2YCBBXX plate3_Parab_HetY201_CKDL220022691-1A_HJ2YCBBXX plate3_Parab_HetY348_CKDL220022691-1A_HJ2YCBBXX plate3_Parab_HetY2_CKDL220022691-1A_HJ2YCBBXX plate3_Parab_HetY339_CKDL220022691-1A_HJ2YCBBXX plate3_Parab_HetY383_CKDL220022691-1A_HJ2YCBBXX plate3_Parab_HetY270_CKDL220022691-1A_HJ2YCBBXX plate3_Parab_HetY236_CKDL220022691-1A_HJ2YCBBXX plate3_Parab_HetY357_CKDL220022691-1A_HJ2YCBBXX plate3_Parab_HetY311_CKDL220022691-1A_HJ2YCBBXX plate3_Parab_HetY258_CKDL220022691-1A_HJ2YCBBXX plate3_Parab_HetY229_CKDL220022691-1A_HJ2YCBBXX plate3_Parab_HetY293_CKDL220022691-1A_HJ2YCBBXX plate3_Parab_HetY360_CKDL220022691-1A_HJ2YCBBXX plate3_Parab_HetY326_CKDL220022691-1A_HJ2YCBBXX
(base) ffelix96@DN52eg1i ~ % head -n 15 /outs/count_matrix_over_aggregate.rows chr1 3043112 3043530 chr1 3671512 3671967 chr1 4412335 4412800 chr1 4571569 4572164 chr1 4769938 4770210 chr1 4775188 4775792 chr1 4785220 4786128 chr1 4807446 4808288 chr1 4857356 4858451 chr1 4874491 4874743 chr1 5019158 5019741 chr1 5022669 5023177 chr1 5070299 5070507 chr1 5076221 5076521 chr1 5082756 5083640
(base) ffelix96@DN52eg1i ~ % head -n 15 /outs/aggregatefragments.tsv.gz
�BCa�9�t�qFc�����P�o�C;�
�!���ź��tc��u��1��^.��X���5��Ǒ��������?���o����������+��?����s�����������?���������7�E������2&���ۙ�WJ��: ��vRJ>)�����ig.1���IG���REor{
�>ߛ&0�{0q�iϏ����f~ f�el��w��{*¬�
�[��P�^���{0|l����5�0����=g��5�-+��BJ����>��
Zw12i��w��=�<?�r�߂)���k
=^
�
�Ȯ#�n��뒸���(� ����QK��7���_c2����[0t¯1����阙�MC�7����mOA��υ����L]cr}Ko�:k��=�T�O,x[��=�P�$�PE�S����Z)�]u�!��7yUoGV�Ox;�v��d1[��N�v��f������V�$�N1=��Lj�;���e�B�B���u�Z��o�C\�q8f-�s�CSW�ni�[��3��N����6��ꡆ���"�;�<�48�N�5�$��[��~T�jR�3��Z�v.�K�-��dV@���k�'$/�
�����pvn��ϋ���c)�w���a���=swI3?���
�cs��+8�@u-M�r�m=�
�
-�S�a�L��v$��(툪5k�-$�Qe�&���?���n��iF��S%��$!ˡ��ڛ��
���MO`���a�� �~�I���>�.
Q�ljYsj�j�E��vB�)���5f��ʀev��V��~���l{/o;|�J�[�x[�|W8Qپ�������Z�a�a\ȹ4an@�dž$��֥�
�D��p��Q�h��f0([NT5��Z�N�0��Mi��oL���^�[����K����ҙ�Ę�G1�=��R��
}����N�R�7�}��
����;A�~���C��)C����c3�v4����O\c��~��al"��@�Ɣ��0�PLy���&&fs��>߶��l�t����r'�͂#�zޟdFCJBH$x���̄G�a�
]-�5O�3X�84����P��Ƥtp��aTo�8/=y��R�I�җk�6S߃�)�f�es��;.0�q��eԮ%�7�z�W�J��-�J-cx.�������玚M��4}(�P``)�L�l�1��._1���3c�OLVΟ��O��(��a?w��̕P����YU��PfUSӊ�u�Tx����l��RVNF#6K���]�G�
S��;�L�߯<��Rs?��4���c�(���1�M���V5Ot���F��f�y�Qadbc��t?:Z�G�r����v[|U-TH�9���
�1�u�ӳ^�>7��R���<�m¥��O;��Л�=E�f~
J��=5G�X�0K����%1ufխ2��J�b�������B�wi�۞Y���ƜCܕJ�&\~���-t�?0R�OT�)�v��)3��6��\7嗿�яO��������8��ä"�C/��a�L��[�#3Ũv���3�����L��"�I�TL\R���Byt?=�S9��(=f��a��+��èc�����-"����Cl(��{GI��|�(k��R`]-�
�y�7�����N�y6�J��T���o�ဧGX:�_���D=���0G�S�����x�m��(����0J�>-�%��\��f̣����(R�����WR/�1ʊZ��M�6�P:C-�k
��}��CȈ�%MW���7�1&f�a�[�p���Y�d�l����P���2��ȚA�j��Њ"w������1�����C�x�gp�0M�L�#�Mp:��
������\�O�{�ykJ�ڙ0�}5�썴��|��o^��~�k
�Ga�������#
̚
1/�ֽY��P�w�������du�Q�D���
K��0��0˽����Mc����
�x�1�ƾL}�X�/��<��T��"�N��jl�Y��iߔ���5��o�k
�=Q�Y}�}=��or�N�AB�ڙ��桼��w��C�=8�P�|��>/ֿd ���C��!�lp�����L���5��e���35�RE�T��c3�$�:�¼49�����:ۯ��M������1�>~�>���![k�KIP�~�L'>�~��%L�F�b��M3�T�bCl��ނY'�d�Qp
0����6~�۴q�|bTM&z����� ��aof��6{�yp��)�V����m%7��Z�����ʞ��֔{cj:1m���U
�Ka
��f�DX��~�o[�dž��
�q��
�0t���|u$E�3�����;�ЊJ&�<���_�
�h�[/�ٮ�t$<�_0���KL�����5��v�.�k
���߂)o����5f
zN]c��#ļg�,H
��gvx�X�X���hhu�w?�d1���wI�ӕ)�R�cxpO��ixY%�@G��<�/�3g�wO��
�%.�T�@��R~���w%�ݖ$�<A�D���U�W���i�]q�"�$�*�=O�f�K�2р~�5z�߈Yo����s���U�-�)�)�FQUW���om��~�-����^k~z#�po.1�6�Q��U=J���iӷ���n�����RS����z/M�bP37�yp�zțV�N��e��Y���w�yOo��qnʙ�6Xs��yj#���z��3��2��$
q��G�'�㕚Wň��m���I)��j����F:��fS
��P�=E1�o�,s��9�Q�ތ�������~�s��Q��w$G�C���Η�2�K����
"RE�~��L��zy�Ѻ�mN��
�w�����)�c�W��[�hoz�,�a�����$�0�fx;�ezJ��1
0�M�� �ra�
1��a`d!��r�!J-��
ݚ�LT0�Lצ�f$�����0�<��6cV����`hE�L�6�E��(����'fH��cΏZ��낍ٽ��X���̞�!�ʻ�&[5�H�16M��aw���غ 0Ա�\���QO`<{�!Ƙ����&]��_��o��,���<z��ѻ1醑%����J��)I*I0!~��ue����w��yf$������c�+��4Y��q���eJ!
}Q߂��{0���6�^1���}����⮱Y���ES˹�[�A^J��^3�6�^��Oʛ��xeg��N�ګr�Й�#{oZ*⬃E���}?��t�(ym_�i8�f9��6ݩ
��ʕ�p��(�l/���w��ʷ���w|��p�C�w@�۠�S�$��0�t��b��SV���ޘ�IO4SC
�ޅ>ٍ�`��i�a��kp���A6è��{a����M�#��io���8�ђ�``�D�������u\�ha��+��M��AoBc��2��Lٗ��
��>����$;ה9ڰ&�\祺u|��||&Q�q>}�_ �e��7E(�
;�[گ�S�R��f�|W*2\idɇ!$�y%>���7z�"
T�"
�G�6��a`�D��U+�sK%������b�Z b ��^�ꘃ�aøv�l��Ҵ�r=�y�EV�ekG�C�؆�s��� _M_v��o��:���־[��Rؘ~��y�6&����Ǔ/0�Z�$��6� 0P�%0|�]c�������!�-3ť�5�oޘzè����y����RWkK�
T�0�e??��Ur*>`�1�/�)������M���M���/���!n�k
uDEx�%00��0�66?y�x����M=����Q�I��9t���
�c�!���1�d~����L�-X3�"Z4
��;�d���b�����هJNP-t��7�NeVQqX�8f����V��붯�Z;Tz?�7a�w�1��Q6Z�����f{����D���NV�V_ a�a:�~$~�]��,�x٘�U��la����u��7Sz9�L:���n�0P��f��ڛ���Q�.���3 !�e�E�I�Q�|�t���z���l��'NG�G�v�5 =��g�gr�.$(�Yx��T�� ��iF�1���2 ��"�6�����\�� ;?�ۥ|���]y�RlmQ�
��Y�}�y�w���� 38Qvf��m�uF���|��"
+oa���~�]a�=���awK����6C��vD��'7�Y��>�}���C�W+�B+'��Z6La
00�>�0�#�0?R�a�O�^���w�Ly���U��J��/dn�T9�,_���6Rw;�-r�Ng?OمVi喥{�_�2�H�
^|���b�FaX�8İr4^Ћ1p�뱫�Z;�ژt�����~}�{��� ��X
�~��S��㬏�
� _1E�(�y=wk�,|��
�kL�������%^�J��U��aX�X���6o�*/�B>N'|>�Po[½�M����&�ܙ/��5L)�z��N�<��F�67�)�s�F34
mqͳ]�xx@��ݷ�Kk�
\��ҷ��}�-�jS�1�N|�a�]#LaeRn�{�s������w泮�cg�7�c����q����S�kK!��Ƒ6f^���<5мU
&ah44ϕ|��"oIQ9��:1S�5'*ѯ1����PU�~��.�,K�2Q��^��\It�� �B��b�c�.k�& �K
�x�~>�6ʧ��=��*��6���ɻ�nw��/�ڵ_/���"�Tٙ�ޖ$��a���0�Ia`H?�0�a\���ODv����!,u2���w@a��B�[���Mi�v����M_�.�O�*L���5t�Y/�ٛ��B��0C�z�"�; 0TP`fB@yKW^^���'ݸXt�B�B��◣#���r��ZU�ό�v���X��ư����ᮧ�_�Z4=Pj^�h�G��+If�V��ڧ�
���M�����
1T=k^W/m��Z�D�a�
����if���O`�#��
Ui>0�]�����[�G%B�쳟�Ъ��hh�6o�T٠77���Z��NS" Ę���v��NƤS�\Uf ��Y�0�(�ɍ�Ĩ��`A��sW����*�
Hi @dbrg77,
I hope your classes and grant writing went well!
Any chance you've been able to look further into this issue?
Thank you,
Hi @FrancoFelix,
Sorry for the late reply. I'm not able to reproduce the result using our own data.
Indeed, all QC looks okay, so not sure why it gives this behaviour when loading into R. I suspect there might be some formatting issues with some files or cell names etc.
Could you please share the following files so that we could have a look for you?
outs/count_matrix_over_aggregate.rows
outs/count_matrix_over_aggregate.cols
outs/count_matrix_over_aggregate.mtx
outs/aggregate_fragments.tsv.gz
outs/sample_info.csv
Thank you!
Xi
Hi @dbrg77,
Great, thank you!
I have emailed you the files. They are to large to attach here.
Best,
Hi @FrancoFelix
Thanks for sharing the file and code. My student @helianfeixing has tried your code using our data, and the code works fine, so the problem is not associated with reading the data into R
.
I noticed that your mtx
file look strange. Are you using the latest version of the pipeline? In our current setting, the mtx
file should have three headers, like this:
%%MatrixMarket matrix coordinate integer general
%
89779 1536 14042796
However, your mtx
file currently looks like this (only two lines of header):
%%MatrixMarket matrix coordinate integer general
89779 1536 1536
If manually change the header like described above, the problem is solved:
Maybe you should try the latest version of the pipeline by clone it again.
In future, @helianfeixing will answer questions regarding the pipeline.
I hope this helps.
Regards, Xi
Hi @dbrg77 and @helianfeixing,
Thank you so much for your help, I greatly appreciate it.
I had trouble installing the macs2 so I used version 2.2.6, it is possible the problem stemmed from there.
Thank you again for the help and for developing the pipeline.
Best,
Great! No problem at all.
Closing now.
Hi Xi,
Thank you for putting together this protocol for plate based scATAC and the resources to process the raw data!
I have been able to process the scATAC-Seq data through the snakemake script provided. However, I have an issue when I create a SeuratObject from the snakemake output. I can create the SeuratObject so longs as I do not filter in the CreateChromatinAssay step with min.cells or min.features. If I do not use these filters I can create the object and continue the analysis to creating my UMAP which shows no clustering.
I suspect that this is due to my SeuratObject only containing two cells that have nonzero values for nCount_peaks and nFeature_peaks. This is odd because all of the other QC looks good.
Any ideas on how to fix this?
Thanks!
`library(Signac) library(Seurat) library(GenomeInfoDb) library(EnsDb.Hsapiens.v75) library(ggplot2) library(patchwork) library(hdf5r) library(dplyr) library(readr) library(biovizBase) library(EnsDb.Mmusculus.v79)
read the content from the 'outs' directory
setwd("~/Experiment/outs") mex_dir_path <- "~/Experiment/outs"
mtx_path <- paste(mex_dir_path, "count_matrix_over_aggregate.mtx", sep = '/') feature_path <- paste(mex_dir_path, "count_matrix_over_aggregate.rows", sep = '/') barcode_path <- paste(mex_dir_path, "count_matrix_over_aggregate.cols", sep = '/')
features <- readr::read_tsv(feature_path, col_names = F) %>% tidyr::unite(feature) barcodes <- readr::read_tsv(barcode_path, col_names = F) %>% tidyr::unite(barcode) metadata <- read.csv( file = "~/Experiment/outs/sample_info.csv", header = TRUE, row.names = 1 )
create a Signac chromatin assay and a Seurat object
mtx <- Matrix::readMM(mtx_path) %>% magrittr::set_rownames(features$feature) %>% magrittr::set_colnames(barcodes$barcode)
features <-features[!grepl("random", features$feature),] features <- features[!grepl("chrUn", features$feature),] mtx <- mtx[rownames(mtx) %in% features$feature,]
chromassay <- CreateChromatinAssay( counts = mtx, sep = c("", "_"), genome = 'mm10', fragments= "~/aggregate_fragments.tsv.gz" )
atac <- CreateSeuratObject( counts = chrom_assay, assay = 'peaks', project = 'scATAC', meta.data = metadata )
atac atac[['peaks']] granges(atac)
extract gene annotations from EnsDb
annotations <- GetGRangesFromEnsDb(ensdb = EnsDb.Mmusculus.v79)
change to UCSC style since the data was mapped to hg19
seqlevelsStyle(annotations) <- 'UCSC'
add the gene information to the object
Annotation(atac) <- annotations atac
compute nucleosome signal score per cell
atac <- NucleosomeSignal(object = atac)
compute TSS enrichment score per cell
atac <- TSSEnrichment(object = atac, fast = FALSE)
atac$blacklist_fraction <- FractionCountsInRegion( object = atac, assay = 'peaks', regions = blacklist_mm10 )
atac$high.tss <- ifelse(atac$TSS.enrichment > 2, 'High', 'Low') TSSPlot(atac, group.by = 'high.tss') + NoLegend()
atac$nucleosome_group <- ifelse(atac$nucleosome_signal > 4, 'NS > 4', 'NS < 4') table(atac$nucleosome_group) FragmentHistogram(object = atac, group.by = 'nucleosome_group', region = "chr1-1-500000000")
VlnPlot( object = atac, features = c('frip', 'library_size', 'TSS.enrichment', 'blacklist_fraction', 'nucleosome_signal'), pt.size = 0.1, ncol = 5 )
atac <- RunTFIDF(atac) atac <- FindTopFeatures(atac, min.cutoff = 'q0') atac <- RunSVD(atac)
DepthCor(atac)
Non-linear dimension reduction and clustering atac <- RunUMAP(object = atac, reduction = 'lsi', dims = 2:10) atac <- FindNeighbors(object = atac, reduction = 'lsi', dims = 2:) atac <- FindClusters(object = atac, verbose = FALSE, algorithm = 3) DimPlot(object = atac, label = TRUE) + NoLegend()
`
Edit*** Here is an image of the the SeuratObject "ATAC" in the environment where we see that ATAC@meta.data$nCount/nFeature_peaks only have nonzero values for the first two cells, all else are zero.