Closed msaland closed 4 months ago
@msaland Can you share content of the output file/barcode_index.csv
?
Hello, sorry for the late reply, here's what the top lines of the index look like: barcode_index.csv
@msaland The files look fine to me. May I ask how large your bam file is? Can you check if there is any read in BAM file has gene information?
import pysam
bam = pysam.AlignmentFile(bamfile, "rb")
for r in bam.fetch(until_eof=True):
if line.has_tag("GX"):
print(line)
break
@ThuyTien1 ~1.6 Gb; none of the reads have gene information (it's not giving me any output for your script).
I had run the pipeline for a different dataset and that one worked (it doesn't have GX/gene information tags either).
@msaland It's impossible to get any result if there is no GX information in any read. It's a bit weird to me that no read is confidently assigned to any gene. Is there any problem with the quality or reference annotation or sequences are not in any annotated gene?
Oh, now that you mention it, I think that was the issue. The other dataset I was working with did have GX barcodes for some of the reads.
For anyone else who runs into this specific issue: I was running STAR on the dataset that failed; STAR does not add the GX barcodes. You need to run STARsolo if you want to have the GX barcodes needed.
Hello,
I'm running _scape prepare_input --utr_file GRCm39_112/GRCm39_112.csv --cb_file ${file}-barcodes.tsv --bam_file ${file}-sorted.bam --outputdir ${file}/ --chunksize 100 and I'm just getting empty .pkl files and nothing being generated. It's generating the barcode_index.csv fine though.
This is the output messages I'm getting:
I'm not sure what the issue is, but here is what the first few lines of my bam file looks like:
and the first few lines of my barcode:
And also my generated GRCm39_112 file for reference, just in case something's wrong there. GRCm39_112.csv
Any insight would be welcome.