FredHutch / Galeano-Nino-Bullman-Intratumoral-Microbiota_2022

Analysis code used in Galeano Nino et al., Impact of Intratumoral Microbiota on Spatial and Cellular Heterogeneity in human cancer. 2022
MIT License
33 stars 10 forks source link

Running for Part2. 10X single cell data #16

Closed hibiscuslee closed 1 year ago

hibiscuslee commented 1 year ago

Hi, I followed the patient_samples_16s_pipeline.sh in Part 2 to analyze OSCC_17, when I run INVADEseq.py, I got this result: len(dict_for_genus) = 3666 [E::idx_find_and_load] Could not retrieve index file for './pathseq_r1/OSCC_17.r1.pathseq.complete.bam' Total reads in pathseq bam = 93926 Total reads in pathseq bam with YP tag = 91276 total cellranger bam reads = 6440594 total cellranger bam reads with UB CB tags (in-cell) = 3839482 total UNMAPPED cellranger bam reads with UB CB tags (in-cell) = 2634502 total cellranger reads with UB_CB_unmap Aligned to Pathseq reads with YP tags = (in-cell) 5542 Total unambigious UMI = 169 total pathogen-associated gems = 121 total filtered pathogen-associated cells = 121 total number of cells = 1418

I'm sure bam file exists, and I also encountered the same error when I used the data I had prepared.

Thanks in advance, Xiang

hanruiw commented 1 year ago

Dear Xiang,

Thanks for your interest in our study! Looks that you already finished running the python script. Index file is not required for this process since Pysam processes all of the sequences in the bam file. However, you can always generate an index file using samtools. Please let me know if you have any other questions!

Best regards, Hanrui