Closed shashwatsahay closed 10 months ago
Also could you check the link provided in issue #1 for the H&E stain t seems to be broken
It would also be great if the same list of all barcodes could be provided for all samples.
Thanks :)
Have you tried reverse complementing the barcodes? A quick check seems to indicate that that works for me, happy to debug further if that doesn't solve your issue.
Yes I even tried that but still couldn't get it to work....
I took the reverse complment code from the getcounts.py file
import numpy as np
import pandas as pd
full_beadfile='slide_dna_seq_analysis/data/human_colon_cancer_3_dna/full_BeadBarcodes.txt'
beads=list()
with open(full_beadfile) as handle:
for line in handle:
beads.append(line.strip().replace(',', ''))
full_beadlocation='slide_dna_seq_analysis/data/human_colon_cancer_3_dna/full_BeadLocations.txt'
bead_loc=list()
with open(full_beadlocation) as handle:
all_coords=pd.DataFrame(np.array([[float(i ) for i in line.split(',')] for line in handle.readlines()]).T, columns=['x', 'y'])
all_coords['barcodes']=beads
coords_file=pd.read_csv('slide_dna_seq_analysis/data/human_colon_cancer_3_dna/human_colon_cancer_3_dna_191204_19.bead_locations.csv')
coords_file[coords_file.barcodes.isin(all_coords.barcodes)]
all_coords[all_coords.barcodes.isin(coords_file.barcodes)]
complement = {"A":"T", "C":"G", "G":"C", "T":"A", "N": "N"}
def reverse_complement(seq):
out = ""
rev = seq[::-1]
for i in range(len(rev)):
out += complement[rev[i]]
return out
all_coords['rev_comp_barcodes']=all_coords['barcodes'].apply(reverse_complement)
coords_file[coords_file.barcodes.isin(all_coords.rev_comp_barcodes)]
Hey @zchiang
Any updates on the barcode matching?
Also it would be great if you could also upload the H&E stain for fig3 as welll. Thanks
Hey @zchiang
Sorry for the repeated pings again but any luck?
Hi @shashwatsahay, thanks for your patience. We had to go back pretty far in our archival records to figure this out, but I think we have the correct files now.
I've uploaded the lists of extended barcodes and spatial locations here: https://drive.google.com/drive/folders/18jkSgXmMED_4dFId9IWze7TzbUrGje2C?usp=drive_link
The matching between the samples in the paper are as follows: mouse_cerebellum_1_dna_200114_14 -> 191118_13 mouse_liver_met_1_dna_191114_06 -> 191026_06 mouse_liver_met_1_dna_191114_05 -> 191026_05 mouse_liver_met_2_dna_200114_10 -> 191118_10 mouse_liver_met_2_rna_200102_04 -> 200102_04 human_colon_cancer_3_dna_191204_19 -> 191026_19 human_colon_cancer_4_dna_200114_13 -> 191118_13 human_colon_cancer_4_rna_200102_06 -> 200102_06
For the slide-DNA samples (191026 and 191118), the barcodes will have to be rearranged in the following order (1 indexed): [2 7 1 6 5 4 3 9 14 8 13 12 11 10]
Doing so will produce a longer list of barcodes/locations that is analogous to the original bead locations files provided, so to match them to the BAM files you will have to reverse complement them.
Lastly, when matching barcodes to the BAMs, we typically use a Hamming distance filter of 1 or 2. Additionally, it's known that the last few in situ sequenced bases on the array (e.g. bases 11 and 10 in the barcode) are of lower quality, so you may have to experiment with excludding them to get maximal matching.
Oh, and the human colon cancer H&E uploaded is the one featured in both Fig. 3 and 4.
Hi @zchiang
Sorry for being annoying, I had asked for the complete list of bead barcodes in the issue #7 but the barcode list which was sent does not match nearly 90% of the barcodes that were provided. Could please recheck if the barcodes provided were correct or not.
I am providing screen shots and the complete jupyter notebook from my jupyter notebook on how I arrived at the conclusion that something went wrong