Open tinakeshav opened 2 years ago
Hi Tina,
If you are working with the fragment file, then the python code is looking for "is__cell_barcode" and "barcode" columns in the file. So if you rename/generate a CSV with these columns it should be able to process the fragment file. Also note that the is__cell_barcode is looking for values == 1. Alternatively you can modify these in the python code (FragmentFileOverlapCounter.py):
line 231: sc_data = sc_data[sc_data['is__cell_barcode'] == 1]
line 239: for curbarcode in sc_data['barcode']:
If using the alignments (.bam), then you need to specify the correct index of these columns using the --bcidx, --cellidx, and --iscellidx parameters.
Best, Asa
Hello, I've fixed those and I'm getting the error pasted below. The Overlaps.txt file is empty (see below). Any guidance on how to navigate this?
(base) [tkeshava@node59 test-am]$ cat Overlaps.txt
chr start end cell id Min Overlap Count Max Overlap Count Mean Mapping Quality Min Mapping Quality Max Mapping Quality Starts Ends
Accordingly, the OverlapsSummary.txt looks like the following
Cell Id Number of Valid Reads Number of Overlaps Barcode Total Number of Reads
AAACAGCCAGAACCGA-1 0 0 AAACAGCCAGAACCGA-1 8404
Error message:
/cluster/home/tkeshava/FragmentFileOverlapCounter.py:193: DeprecationWarning: `np.object` is a deprecated alias for the builtin `object`. To silence this warning, use `object` by itself. Doing this will not modify any behavior and is safe.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
table = np.empty(shape=(len(overlapcounts),len(colnames)), dtype=np.object)
/cluster/home/tkeshava/miniconda3/envs/ATACseq/lib/python3.8/site-packages/numpy/core/fromnumeric.py:3440: RuntimeWarning: Mean of empty slice.
return _methods._mean(a, axis=axis, dtype=dtype,
/cluster/home/tkeshava/miniconda3/envs/ATACseq/lib/python3.8/site-packages/numpy/core/_methods.py:189: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
Traceback (most recent call last):
File "/cluster/home/tkeshava/FragmentFileOverlapCounter.py", line 354, in <module>
findOverlaps(fragmentfile, singlecellfile, expectedoverlap, chromosomes, outdir, maxinsert)
File "/cluster/home/tkeshava/FragmentFileOverlapCounter.py", line 352, in findOverlaps
print("Max overlaps in segment: "+str(np.max(stats_overlapsizes)))
File "<__array_function__ internals>", line 5, in amax
File "/cluster/home/tkeshava/miniconda3/envs/ATACseq/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 2754, in amax
return _wrapreduction(a, np.maximum, 'max', axis, None, out,
File "/cluster/home/tkeshava/miniconda3/envs/ATACseq/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 86, in _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: zero-size array to reduction operation maximum which has no identity
Traceback (most recent call last):
File "/cluster/home/tkeshava/AMULET.py", line 201, in <module>
lengths = filtereddata[:,2]-filtereddata[:,1]+1
IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed
Seeing the number of valid reads as 0 when using the fragment file overlap counter script tells me that either:
1) The insert sizes of the paired-end fragments are > 900 (default) for all of the reads or, 2) The chromosomes are not matching the chromosomes in the file provided as input
These are the only criteria used to count "valid reads" when using the fragment file which has already filtered alignments for various flags. Double checking the chromosome names may solve the issue. For example, the human_autosome.txt file provided as default list the chromosomes as chr1 chr2 etc, while the fragment file may have just 1, 2 etc. If this is the case, you will just need to create a similar file with the correct chromosome formatting.
Alternatively, you can comment out these two lines in FragmentFileOverlapCounter.py:
if curchr not in chrlist:
continue #skip reads from chromosomes not in the provided chromosome list
by putting a # in front of both. In this case, there is no filtering based on chromosomes at all.
The valid read count should be close to the total read count. Hope this helps!
I encountered this problem with running AMULET on multiome.
The barcode column matching the barcodes that AMULET is trying to read in atac_fragments.tsv.gz
in per_barcode_metrics.csv
is NOT "atac_barcode", it is "barcode".
So do this over your 10X multiome cellranger outputs:
for(f in files){
message(f)
bc_metrics = read.csv(file.path("path", "to", f, "per_barcode_metrics.csv")) #read in the per_barcode_metrics.csv
df = data.frame("barcode" = bc_metrics$barcode, "is__cell_barcode" = bc_metrics$is_cell) #make a dataframe that can be used as "single_cell.csv"
write.csv(df, file.path("path", "to", f, "single_cell.csv"), row.names = FALSE) #write out the dataframe as "single_cell.csv"
}
Now run AMULET using single_cell.csv
I'm curious about using barcode instead of atac_barcode here. I'm checking the Cell Ranger Arc documentation and when I check:
Barcode is the same as GEX barcode, while atac_barcode is the ATAC library
The fragment file barcode corresponds to the BAM file barcode: https://support.10xgenomics.com/single-cell-multiome-atac-gex/software/pipelines/latest/output/fragments
I would imagine that the barcode in the bam file corresponds to the atac_barcode based on when the BAM is generated: https://support.10xgenomics.com/single-cell-multiome-atac-gex/software/pipelines/latest/algorithms/overview
I'm just trying to understand this myself since this is important to get correct.
Hi Asa,
10X could make this more clear, but the barcode in the ATAC bam file is the "gex" barcode. They translate the ATAC barcode to the matching gex barcode before adding it to the ATAC bam: https://support.10xgenomics.com/single-cell-multiome-atac-gex/software/pipelines/latest/output/bam-atac#bam-barcode-translation
Thanks,
Kevin
I encountered this problem with running AMULET on multiome. The barcode column matching the barcodes that AMULET is trying to read in
atac_fragments.tsv.gz
inper_barcode_metrics.csv
is NOT "atac_barcode", it is "barcode". So do this over your 10X multiome cellranger outputs:for(f in files){ message(f) bc_metrics = read.csv(file.path("path", "to", f, "per_barcode_metrics.csv")) #read in the per_barcode_metrics.csv df = data.frame("barcode" = bc_metrics$barcode, "is__cell_barcode" = bc_metrics$is_cell) #make a dataframe that can be used as "single_cell.csv" write.csv(df, file.path("path", "to", f, "single_cell.csv"), row.names = FALSE) #write out the dataframe as "single_cell.csv" }
Now run AMULET using
single_cell.csv
After using cellranger-arc-2.0.2
to process my multiome data, I needed to perform these steps in R
and also add quote = FALSE
in the last write.csv()
step.
After then quitting R
, my amulet command line script looked like:
bash AMULET.sh --forcesorted --iscellidx 1 --bcidx 0 $outs/multiome/$sample/outs/atac_possorted_bam.bam $outs/multiome/$sample/outs/single_cell.csv human_autosomes.txt RestrictionRepeatLists/restrictionlist_repeats_segdups_rmsk_hg38.bed $outs/multiome/$sample/outs/amulet/ ../amulet/
I think you could also just
cat per_barcode_metrics.csv | cut -d, -f1,4 | sed "1s/.*/barcode,is__cell_barcode/" > singlecell.csv
Hi AMULET team, I have a 10X multiome kit dataset and trying to run AMULET on the ATAC data. The fragments input file is the same, but the singlecell.csv is no longer available as CellRanger output. Instead, the following
per_barcode_metrics.csv
is produced.The is_cell column corresponds to the is__cell_barcode from singlecell.csv I believe. I've tried a quick and dirty solution of renaming the column, which failed. The atac_barcode column corresponds to the barcode column from singlecell.csv .
Is there a good way to modify the source code accordingly to take care of this? Thanks so much, Tina