Open bassanio opened 1 year ago
Hello @bassanio did you already go through the documentation? If so, could you maybe tell me specifically what you need help with?
Hi I guess to the same spirit as the previous question.
So in the documentation you outline the structure of R1 with the UMI position and then R2 with the Ab barcode data. You also mention how you provide the tag.csv file which will take the input fq files, and generate counts based on the Ab barcodes provided in the csv. That part makes sense.
Now my question is, where does the barcode info for the HTO come into play? Where do you specify those and where does cite-seq-count deal with that? Do i need to run cite-seq-count twice, once for the Ab barcodes and then a 2nd time for the hto? Or do I make a single csv file with the hto and Ab sequences and let cite-seq-count loose on all of it in 1 go?
(I have 1 file of the format [say hto.csv]...
XXXXXX,hashtag1
YYYYYY,hashtag2
... and a 2nd file of format [say abs.csv]...
AAAAAA,Ab1
BBBBBB,Ab2
are you able to provide pseudo-code/commands as to how to run cite-seq-count for each of hto.csv and abs.csv to get the desired counts required for progressing...?)
The 2nd question, assuming now that we deal with the hto/Ab situation. The next step would require loading this information into Seurat for integration, is that correct?
So depending on how your libraries habve been sequenced, you ocan run everythint together. You should have fastqs for ABs and fastqs for HTO.
Does cellranger give you the output you need for the ABs?
If so, you only need to run CSC on the HTO.
You can make a tsg.csv with all your HTO tags and all your AB tags, CSC will try and match all of those on the fastqs you provide.
Pseudo code is very simple.
-start-trim
), if not found, flag as unmapped.Yes, you need then to load up the results into Seurat to do the demultiplexing.
I have fq for the ab's and for the hto's seperate to the expression data (ie the fq have been split into the different samples, and each sample has its corresponding ab + hto fq files)
So if i understand you correctly i need to run cellranger on the ab+hto fq separately to get the counts matrix for those, right? and a 2nd run of cellranger on the expression fq files for those counts?
After which i just run CSC on the ab+hto-fq's with
CITE-seq-Count -R1 ab-HTO_R1.fastq.gz -R2 ab-HTO_R2.fastq.gz \
-t TAG_LIST_HTO-Ab.csv -cbf 1 -cbl 16 -umif 17 -umil 26 -cells 20000 -o ./out/
did i understand you correctly?
and from there into R for the rest 👍
So, depending on which kit you used from 10x, you can run RNA AB and HTO together. Whatever deviates from the normal protocol will not be compatible with the software. So, if the HTO is not in the kit, you need to run CSC on that part.
On Thu, 2 Feb 2023 at 22:07, Mahmoud A. Bassal @.***> wrote:
Ok, yes i have fq for the ab's and for the hto's seperate (ie the fq have been split into the different sample types, and each sample has its corresponding ab fq files)
So if i understand you correctly i need to run cellranger on the hto fq separately to get the counts matrix for those, right? and a 2nd run of cellranger on the ab fq files for those counts?
After which i just run CSC on the hto-fq's with
CITE-seq-Count -R1 HTO_R1.fastq.gz -R2 HTO_R2.fastq.gz \
-t TAG_LIST_HTO-Ab.csv -cbf 1 -cbl 16 -umif 17 -umil 26 -cells 20000 -o ./out/
did i understand you correctly?
and from there into R for the rest 👍
— Reply to this email directly, view it on GitHub https://github.com/Hoohm/CITE-seq-Count/issues/174#issuecomment-1414372995, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJVO2E5NR5DMQQNVSY7C2TWVQOYZANCNFSM6AAAAAAUIPISNI . You are receiving this because you commented.Message ID: @.***>
--
Roelli Patrick Division of Animal Physiology and Immunology TUM School of Life Sciences Weihenstephan Technische Universität München Weihenstephaner Berg 3 85354 Freising Germany
Hi ,
I have tried to run the citeseq using the below command and I have got the following error.
I am also confused with R2 and R3 because for me I am finding the ABs in the R3 and not in R2.
CITE-seq-Count \
-R1 hto_S3_L001_R1_001.fastq.gz\
-R2 hto_S3_L001_R3_001.fastq.gz \
-t TAGS.txt \
-cbf 1 -cbl 16 -umif 17 -umil 26 -cells 13641 \
-o RESULT
Tag File
ACCCACCAGTAAGAC,First_P1_Undivided
GGTCGAGAGCATTCA,Second_P2_late_dividers
CTTGCCGCATGTCAT,Third_P3_Early_dividers
Executing the above command with Warning and issue error
Read1 length is 51bp but you are using 26bp for Cell and UMI barcodes combined.
This might lead to wrong cell attribution and skewed umi counts.
Counting number of reads
Started mapping
Processing 10,651,191 read
CITE-seq-Count is running with XX cores.
Mapping done for process 2006672. Processed 166,424 reads
Mapping done for process 2006674. Processed 166,424 reads
Mapping done for .......
Mapping done for process 2006731. Processed 166,424 reads
Mapping done
Merging results
Correcting cell barcodes
Looking for a whitelist
Collapsing cell barcodes
Correcting umis
Traceback (most recent call last):
File "/home/.local/bin/CITE-seq-Count", line 8, in <module>
sys.exit(main())
File "/home/.local/lib/python3.9/site-packages/cite_seq_count/__main__.py", line 435, in main
) = processing.correct_umis(
File "/home/.local/lib/python3.9/site-packages/cite_seq_count/processing.py", line 229, in correct_umis
for TAG in final_results[cell_barcode]:
RuntimeError: dictionary keys changed during iteration
HTO R1 :
HTO R2 :
HTO R3 :
grep AB TAG in R3 :
Some AB barcodes does not start correctly as shown in the example
@bassanio try to setup a conda environment with python version 3.7.16
and run it again. I have had no luck with any python version > 3.7. The error is actually an issue with changes in the pandas package. If you restrict python to 3.7.16, pip install CITE-seq-Count==1.4.5
will pull the correct pandas package version. good luck!
@cpflueger2016 : Thanks for the information I will do the same.
Can you also help me in understanding in R2 and R3 fastq files
Yea, so if you get the index read from the i7 index parsed out (there is an option in bcl2fastq
), your read2 is actually the index of the library and read3 is truly the second read.
@cpflueger2016 : I have this warning message in the top
Read1 length is 51bp but you are using 26bp for Cell and UMI barcodes combined"
Should I change the umil to 51 ? do this has some affect on the analysis
This is not going to affect the analysis. Back in the day I wanted to make sure people knew what they were running and catch potential wrong lengths. In hindsight this might have been a mistake as it confuses users more than anything.
Is your general issue resolved, can I close this one?
Hi,
I am very much new to the Hashing method. I have got a 10x output using cellranger-arc (has both RNASEQ and ATCseq). I was told the samples are multiplexed using Biolegend hashing Ab and I have been provided with the Ab sequences.
1) How can I use the provided Ab sequences to demultiplex the output of cellranger-arc.